Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

Mining file histories: should we consider branches?

Kovalenko, Vladimir; Palomba, Fabio; Bacchelli, Alberto (2018). Mining file histories: should we consider branches? In: ASE '18: 33rd ACM/IEEE International Conference on Automated Software Engineering, Montpellier France, 3 October 2018 - 7 October 2018. ACM, 202-213.

Abstract

Modern distributed version control systems, such as Git, offer support for branching - the possibility to develop parts of software outside the master trunk. Consideration of the repository structure in Mining Software Repository (MSR) studies requires a thorough approach to mining, but there is no well-documented, widespread methodology regarding the handling of merge commits and branches. Moreover, there is still a lack of knowledge of the extent to which considering branches during MSR studies impacts the results of the studies. In this study, we set out to evaluate the importance of proper handling of branches when calculating file modification histories. We analyze over 1,400 Git repositories of four open source ecosystems and compute modification histories for over two million files, using two different algorithms. One algorithm only follows the first parent of each commit when traversing the repository, the other returns the full modification history of a file across all branches. We show that the two algorithms consistently deliver different results, but the scale of the difference varies across projects and ecosystems. Further, we evaluate the importance of accurate mining of file histories by comparing the performance of common techniques that rely on file modification history - reviewer recommendation, change recommendation, and defect prediction - for two algorithms of file history retrieval. We find that considering full file histories leads to an increase in the techniques' performance that is rather modest.

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:03 Faculty of Economics > Department of Informatics
Dewey Decimal Classification:000 Computer science, knowledge & systems
Scopus Subject Areas:Physical Sciences > Computational Theory and Mathematics
Physical Sciences > Human-Computer Interaction
Physical Sciences > Software
Scope:Discipline-based scholarship (basic research)
Language:English
Event End Date:7 October 2018
Deposited On:26 Jan 2021 10:51
Last Modified:06 Mar 2024 14:33
Publisher:ACM
ISBN:9781450359375
OA Status:Green
Publisher DOI:https://doi.org/10.1145/3238147.3238169
Other Identification Number:merlin-id:20232
Download PDF  'Mining file histories: should we consider branches?'.
Preview
  • Content: Published Version

Metadata Export

Statistics

Citations

Dimensions.ai Metrics
23 citations in Web of Science®
30 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

82 downloads since deposited on 26 Jan 2021
26 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications