Header

UZH-Logo

Maintenance Infos

Development emails content analyzer: intention mining in developer discussions


Di Sorbo, Andrea; Panichella, Sebastiano; Visaggio, Corrado Aaron; Di Penta, Massimiliano; Canfora, Gerardo; Gall, Harald (2015). Development emails content analyzer: intention mining in developer discussions. In: IEEE/ACM International Conference on Automated Software Engineering, Lincoln, Nebraska, USA, 9 November 2015 - 13 November 2015.

Abstract

Written development communication (e.g. mailing lists, issue trackers) constitutes a precious source of information to build recommenders for software engineers, for example aimed at suggesting experts, or at redocumenting existing source code. In this paper we propose a novel, semi-supervised approach named DECA (Development Emails Content Analyzer) that uses Natural Language Parsing to classify the content of development emails according to their purpose (e.g. feature request, opinion asking, problem discovery, solution proposal, information giving etc), identifying email elements that can be used for specific tasks. A study based on data from Qt and Ubuntu, highlights a high precision (90%) and recall (70%) of DECA in classifying email content, outperforming traditional machine learning strategies. Moreover, we successfully used DECA for re-documenting source code of Eclipse and Lucene, improving the recall, while keeping high precision, of a previous approach based on ad-hoc heuristics.

Abstract

Written development communication (e.g. mailing lists, issue trackers) constitutes a precious source of information to build recommenders for software engineers, for example aimed at suggesting experts, or at redocumenting existing source code. In this paper we propose a novel, semi-supervised approach named DECA (Development Emails Content Analyzer) that uses Natural Language Parsing to classify the content of development emails according to their purpose (e.g. feature request, opinion asking, problem discovery, solution proposal, information giving etc), identifying email elements that can be used for specific tasks. A study based on data from Qt and Ubuntu, highlights a high precision (90%) and recall (70%) of DECA in classifying email content, outperforming traditional machine learning strategies. Moreover, we successfully used DECA for re-documenting source code of Eclipse and Lucene, improving the recall, while keeping high precision, of a previous approach based on ad-hoc heuristics.

Statistics

Citations

Dimensions.ai Metrics
7 citations in Web of Science®
17 citations in Scopus®
21 citations in Microsoft Academic
Google Scholar™

Altmetrics

Downloads

132 downloads since deposited on 15 Oct 2015
61 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:03 Faculty of Economics > Department of Informatics
Dewey Decimal Classification:000 Computer science, knowledge & systems
Language:English
Event End Date:13 November 2015
Deposited On:15 Oct 2015 14:51
Last Modified:14 Feb 2018 09:32
Publisher:s.n.
OA Status:Green
Publisher DOI:https://doi.org/10.1109/ASE.2015.12
Official URL:http://ase2015.unl.edu/
Other Identification Number:merlin-id:12395

Download

Download PDF  'Development emails content analyzer: intention mining in developer discussions'.
Preview
Content: Accepted Version
Filetype: PDF
Size: 1MB
View at publisher