Header

UZH-Logo

Maintenance Infos

A search-based training algorithm for cost-aware defect prediction


Panichella, Annibale; Alexandru, Carol V; Panichella, Sebastiano; Bacchelli, Alberto; Gall, Harald (2016). A search-based training algorithm for cost-aware defect prediction. In: Genetic and Evolutionary Computation Conference, Denver, 20 July 2016 - 24 July 2016, Epub ahead of print.

Abstract

Research has yielded approaches to predict future defects in software artifacts based on historical information, thus assisting companies in effectively allocating limited development resources and developers in reviewing each others’ code changes. Developers are unlikely to devote the same effort to inspect each software artifact predicted to contain defects, since the effort varies with the artifacts’ size (cost) and the number of defects it exhibits (effectiveness). We propose to use Genetic Algorithms (GAs) for training prediction models to maximize their cost-effectiveness. We evaluate the approach on two well-known models, Regression Tree and Generalized Linear Model, and predict defects between multiple releases of six open source projects. Our results show that regression models trained by GAs significantly outperform their traditional counterparts, improving the cost-effectiveness by up to 240%. Often the top 10% of predicted lines of code contain up to twice as many defects.

Abstract

Research has yielded approaches to predict future defects in software artifacts based on historical information, thus assisting companies in effectively allocating limited development resources and developers in reviewing each others’ code changes. Developers are unlikely to devote the same effort to inspect each software artifact predicted to contain defects, since the effort varies with the artifacts’ size (cost) and the number of defects it exhibits (effectiveness). We propose to use Genetic Algorithms (GAs) for training prediction models to maximize their cost-effectiveness. We evaluate the approach on two well-known models, Regression Tree and Generalized Linear Model, and predict defects between multiple releases of six open source projects. Our results show that regression models trained by GAs significantly outperform their traditional counterparts, improving the cost-effectiveness by up to 240%. Often the top 10% of predicted lines of code contain up to twice as many defects.

Statistics

Altmetrics

Downloads

58 downloads since deposited on 06 May 2016
30 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:03 Faculty of Economics > Department of Informatics
Dewey Decimal Classification:000 Computer science, knowledge & systems
Event End Date:24 July 2016
Deposited On:06 May 2016 17:47
Last Modified:30 Jan 2017 08:17
Publisher DOI:https://doi.org/10.1145/2908812.2908938
Other Identification Number:merlin-id:13244

Download

Preview Icon on Download
Preview
Language: English
Filetype: PDF
Size: 356kB
View at publisher

TrendTerms

TrendTerms displays relevant terms of the abstract of this publication and related documents on a map. The terms and their relations were extracted from ZORA using word statistics. Their timelines are taken from ZORA as well. The bubble size of a term is proportional to the number of documents where the term occurs. Red, orange, yellow and green colors are used for terms that occur in the current document; red indicates high interlinkedness of a term with other terms, orange, yellow and green decreasing interlinkedness. Blue is used for terms that have a relation with the terms in this document, but occur in other documents.
You can navigate and zoom the map. Mouse-hovering a term displays its timeline, clicking it yields the associated documents.

Author Collaborations