Header

UZH-Logo

Maintenance Infos

Improving reliability of defect prediction models: from temporal reasoning and machine learning perspective


Ekanayake, Jayalath. Improving reliability of defect prediction models: from temporal reasoning and machine learning perspective. 2012, University of Zurich, Faculty of Economics.

Abstract

Software quality is an important factor since software systems are playing a key role in today’s world. There are several perspectives within the field on software quality measurement. One such frequently used measurement (or metric) is the number of defects that could result in crashes, catastrophic failures, or security breaches encountered in the software. Testing the software for such defect is essential to enhance the quality. However, due to the rising complexity of software manual testing was becoming extremely time consuming task and consequently, many more automatic supporting tools have been developed. One such supporting tool is defect prediction models. A large number of defect prediction models can be found in the literature and most of them share a common procedure to develop the models. In general, the models’ development procedure indirectly assumes that underlying data distribution of software systems is relatively stable over time. But, this assumption is not necessarily true and consequently, the reliability of those models is doubtful at some points in time. In this thesis, therefore, we presented temporal or time-based reasoning techniques that improve the reliability of prediction models. By exploring four open source software (OSS) projects and one cost estimation dataset, we first disclosed that real-time based data sampling compared to random sampling improves the prediction quality. Also, the temporal features are more appropriate than static features for defect prediction. Furthermore, we found that the non-linear models are better than linear models for defect prediction. This implies, the relationship between project features and the defects is not linear. Further investigations showed that prediction quality varies significantly over time and hence, testing a model in one or few data samples is not sufficient to generalize the model. Specifically, we unveiled that the project features influence the model’s prediction quality and therefore, the model’s prediction quality itself can be predicted. Finally, we turned these insights into a tool that estimates the prediction quality of models in advance. This tool supports the developers to determine when to apply their models and when not.Our presented temporal-reasoning techniques can be easily adapted to most of the existing prediction models for enhancing the reliability of those models. Generality, these techniques are easy-to-use, extensible, and show high degree of flexibility in terms of customization to real applications. More important, we provided a tool that supports the developers to make a decision about their prediction models in advance.

Abstract

Software quality is an important factor since software systems are playing a key role in today’s world. There are several perspectives within the field on software quality measurement. One such frequently used measurement (or metric) is the number of defects that could result in crashes, catastrophic failures, or security breaches encountered in the software. Testing the software for such defect is essential to enhance the quality. However, due to the rising complexity of software manual testing was becoming extremely time consuming task and consequently, many more automatic supporting tools have been developed. One such supporting tool is defect prediction models. A large number of defect prediction models can be found in the literature and most of them share a common procedure to develop the models. In general, the models’ development procedure indirectly assumes that underlying data distribution of software systems is relatively stable over time. But, this assumption is not necessarily true and consequently, the reliability of those models is doubtful at some points in time. In this thesis, therefore, we presented temporal or time-based reasoning techniques that improve the reliability of prediction models. By exploring four open source software (OSS) projects and one cost estimation dataset, we first disclosed that real-time based data sampling compared to random sampling improves the prediction quality. Also, the temporal features are more appropriate than static features for defect prediction. Furthermore, we found that the non-linear models are better than linear models for defect prediction. This implies, the relationship between project features and the defects is not linear. Further investigations showed that prediction quality varies significantly over time and hence, testing a model in one or few data samples is not sufficient to generalize the model. Specifically, we unveiled that the project features influence the model’s prediction quality and therefore, the model’s prediction quality itself can be predicted. Finally, we turned these insights into a tool that estimates the prediction quality of models in advance. This tool supports the developers to determine when to apply their models and when not.Our presented temporal-reasoning techniques can be easily adapted to most of the existing prediction models for enhancing the reliability of those models. Generality, these techniques are easy-to-use, extensible, and show high degree of flexibility in terms of customization to real applications. More important, we provided a tool that supports the developers to make a decision about their prediction models in advance.

Statistics

Additional indexing

Item Type:Dissertation (monographical)
Referees:Bernstein Abraham
Communities & Collections:03 Faculty of Economics > Department of Informatics
UZH Dissertations
Dewey Decimal Classification:000 Computer science, knowledge & systems
Language:English
Date:2012
Deposited On:18 Feb 2013 08:26
Last Modified:27 Aug 2021 08:41
Number of Pages:181
OA Status:Closed
Other Identification Number:merlin-id:7946
Full text not available from this repository.