Header

UZH-Logo

Maintenance Infos

Improving defect prediction using temporal features and non linear models


Bernstein, Abraham; Ekanayake, Jayalath; Pinzger, Martin (2007). Improving defect prediction using temporal features and non linear models. In: Proceedings of the International Workshop on Principles of Software Evolution, Cavtat, Croatia, 1 September 2007 - 1 September 2007, 11-18.

Abstract

Predicting the defects in the next release of a large soft- ware system is a very valuable asset for the pro ject manger to plan her resources. In this paper we argue that temporal features (or aspects) of the data are central to prediction per- formance. We also argue that the use of non-linear models, as opposed to traditional regression, is necessary to uncover some of the hidden interrelationships between the features and the defects and maintain the accuracy of the prediction in some cases. Using data obtained from the CVS and Bugzilla reposito- ries of the Eclipse pro ject, we extract a number of temporal features, such as the number of revisions and number of re- ported issues within the last three months. We then use these data to predict both the location of defects (i.e., the classes in which defects will occur) as well as the number of reported bugs in the next month of the pro ject. To that end we use standard tree-based induction algorithms in compar- ison with the traditional regression. Our non-linear models uncover the hidden relationships be- tween features and defects, and present them in easy to un- derstand form. Results also show that using the temporal features our prediction model can predict whether a source ?le will have a defect with an accuracy of 99% (area under ROC curve 0.9251) and the number of defects with a mean absolute error of 0.019 (Spearman’s correlation of 0.96).

Abstract

Predicting the defects in the next release of a large soft- ware system is a very valuable asset for the pro ject manger to plan her resources. In this paper we argue that temporal features (or aspects) of the data are central to prediction per- formance. We also argue that the use of non-linear models, as opposed to traditional regression, is necessary to uncover some of the hidden interrelationships between the features and the defects and maintain the accuracy of the prediction in some cases. Using data obtained from the CVS and Bugzilla reposito- ries of the Eclipse pro ject, we extract a number of temporal features, such as the number of revisions and number of re- ported issues within the last three months. We then use these data to predict both the location of defects (i.e., the classes in which defects will occur) as well as the number of reported bugs in the next month of the pro ject. To that end we use standard tree-based induction algorithms in compar- ison with the traditional regression. Our non-linear models uncover the hidden relationships be- tween features and defects, and present them in easy to un- derstand form. Results also show that using the temporal features our prediction model can predict whether a source ?le will have a defect with an accuracy of 99% (area under ROC curve 0.9251) and the number of defects with a mean absolute error of 0.019 (Spearman’s correlation of 0.96).

Statistics

Citations

Downloads

5 downloads since deposited on 14 Mar 2013
0 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:03 Faculty of Economics > Department of Informatics
Dewey Decimal Classification:000 Computer science, knowledge & systems
Language:English
Event End Date:1 September 2007
Deposited On:14 Mar 2013 11:38
Last Modified:03 Aug 2017 08:34
Publisher:IEEE Computer Society
Other Identification Number:merlin-id:2729

Download

Preview Icon on Download
Filetype: PDF - Registered users only
Size: 207kB

TrendTerms

TrendTerms displays relevant terms of the abstract of this publication and related documents on a map. The terms and their relations were extracted from ZORA using word statistics. Their timelines are taken from ZORA as well. The bubble size of a term is proportional to the number of documents where the term occurs. Red, orange, yellow and green colors are used for terms that occur in the current document; red indicates high interlinkedness of a term with other terms, orange, yellow and green decreasing interlinkedness. Blue is used for terms that have a relation with the terms in this document, but occur in other documents.
You can navigate and zoom the map. Mouse-hovering a term displays its timeline, clicking it yields the associated documents.

Author Collaborations