Header

UZH-Logo

Maintenance Infos

Redundancy-free analysis of multi-revision software artifacts


Alexandru, Carol V; Panichella, Sebastiano; Proksch, Sebastian; Gall, Harald C (2019). Redundancy-free analysis of multi-revision software artifacts. Empirical Software Engineering, 24(1):332-380.

Abstract

Researchers often analyze several revisions of a software project to obtain historical data about its evolution. For example, they statically analyze the source code and monitor the evolution of certain metrics over multiple revisions. The time and resource requirements for running these analyses often make it necessary to limit the number of analyzed revisions, e.g., by only selecting major revisions or by using a coarse-grained sampling strategy, which could remove significant details of the evolution. Most existing analysis techniques are not designed for the analysis of multi-revision artifacts and they treat each revision individually. However, the actual difference between two subsequent revisions is typically very small. Thus, tools tailored for the analysis of multiple revisions should only analyze these differences, thereby preventing re-computation and storage of redundant data, improving scalability and enabling the study of a larger number of revisions. In this work, we propose the Lean Language-Independent Software Analyzer (LISA), a generic framework for representing and analyzing multi-revisioned software artifacts. It employs a redundancy-free, multi-revision representation for artifacts and avoids re-computation by only analyzing changed artifact fragments across thousands of revisions. The evaluation of our approach consists of measuring the effect of each individual technique incorporated, an in-depth study of LISA resource requirements and a large-scale analysis over 7 million program revisions of 4,000 software projects written in four languages. We show that the time and space requirements for multi-revision analyses can be reduced by multiple orders of magnitude, when compared to traditional, sequential approaches.

Abstract

Researchers often analyze several revisions of a software project to obtain historical data about its evolution. For example, they statically analyze the source code and monitor the evolution of certain metrics over multiple revisions. The time and resource requirements for running these analyses often make it necessary to limit the number of analyzed revisions, e.g., by only selecting major revisions or by using a coarse-grained sampling strategy, which could remove significant details of the evolution. Most existing analysis techniques are not designed for the analysis of multi-revision artifacts and they treat each revision individually. However, the actual difference between two subsequent revisions is typically very small. Thus, tools tailored for the analysis of multiple revisions should only analyze these differences, thereby preventing re-computation and storage of redundant data, improving scalability and enabling the study of a larger number of revisions. In this work, we propose the Lean Language-Independent Software Analyzer (LISA), a generic framework for representing and analyzing multi-revisioned software artifacts. It employs a redundancy-free, multi-revision representation for artifacts and avoids re-computation by only analyzing changed artifact fragments across thousands of revisions. The evaluation of our approach consists of measuring the effect of each individual technique incorporated, an in-depth study of LISA resource requirements and a large-scale analysis over 7 million program revisions of 4,000 software projects written in four languages. We show that the time and space requirements for multi-revision analyses can be reduced by multiple orders of magnitude, when compared to traditional, sequential approaches.

Statistics

Citations

Dimensions.ai Metrics

Altmetrics

Downloads

5 downloads since deposited on 12 Jul 2018
3 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:03 Faculty of Economics > Department of Informatics
Dewey Decimal Classification:000 Computer science, knowledge & systems
Uncontrolled Keywords:Software
Language:English
Date:1 February 2019
Deposited On:12 Jul 2018 08:28
Last Modified:05 Jul 2019 00:00
Publisher:Springer
ISSN:1382-3256
OA Status:Green
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.1007/s10664-018-9630-9
Other Identification Number:merlin-id:16486
Project Information:
  • : FunderSNSF
  • : Grant ID200021_149450
  • : Project TitleWhiteboard - An e-Science Framework for Replication of Software Studies
  • : FunderSNSF
  • : Grant ID200021_166275
  • : Project TitleSURF-MobileAppsData

Download

Download PDF  'Redundancy-free analysis of multi-revision software artifacts'.
Preview
Content: Published Version
Filetype: PDF
Size: 975kB
View at publisher