Header

UZH-Logo

Maintenance Infos

Efficient software evolution analysis: algorithmic and visual tools for investigating fine-grained software histories


Alexandru, Carol V. Efficient software evolution analysis: algorithmic and visual tools for investigating fine-grained software histories. 2019, University of Zurich, Faculty of Economics.

Abstract

Software analysis and its diachronic sibling, software evolution analysis, rely heavily on data computed by processing existing software.
Countless tools have been created for the analysis of source code, binaries and other artifacts.
The majority of these tools are written for one particular programming language and their modus operandi typically comprises the analysis of artifacts contained in file system directories representing the current version of a software system.

Researchers repurpose these tools for investigating software evolution by analyzing multiple revisions over the lifetime of a project.
But even though changes between revisions are usually tiny compared to the size of the affected artifacts, existing software evolution analysis techniques usually rely on redundantly re-analyzing entire files at best, or entire projects at worst, for every additional revision analyzed.
These limitations of being tied to a single ecosystem and of treating software as a static, timeless construct, affects how we do software evolution research: it often self-restricts, rather arbitrarily, to the analysis of only a subset of revisions, instead of the full, high-resolution history of a project.
Thus, there exist both a need and the potential for representing and analyzing software artifacts more efficiently.

In this thesis, we identify several processes in existing software evolution analysis pipelines that suffer from redundancies and inefficiencies.
We then develop purpose-agnostic solutions for improving these processes and combine them in a generic, reusable, and extensible analysis library, called LISA.
We evaluate our approach extensively by computing (and publishing) code metrics for millions of program revisions, testing its generalizability by supporting multiple types of artifacts, analyses and programming languages, and by applying our tool to conduct concrete code studies.

Our findings indicate that analyzing software evolution using traditional tools incurs significant redundancies.
We demonstrate that the individual techniques we present are generalizable to multiple programming languages and artifact types and that they can accelerate the processing of evolving software by multiple orders of magnitude.
Alongside these core findings, our research has resulted in a state-of-the-art, open-source software analysis library, a large public dataset of historical code metrics, and incremental advancements in understanding the pace of software evolution, developer behavior and the visualization of software evolution.

Abstract

Software analysis and its diachronic sibling, software evolution analysis, rely heavily on data computed by processing existing software.
Countless tools have been created for the analysis of source code, binaries and other artifacts.
The majority of these tools are written for one particular programming language and their modus operandi typically comprises the analysis of artifacts contained in file system directories representing the current version of a software system.

Researchers repurpose these tools for investigating software evolution by analyzing multiple revisions over the lifetime of a project.
But even though changes between revisions are usually tiny compared to the size of the affected artifacts, existing software evolution analysis techniques usually rely on redundantly re-analyzing entire files at best, or entire projects at worst, for every additional revision analyzed.
These limitations of being tied to a single ecosystem and of treating software as a static, timeless construct, affects how we do software evolution research: it often self-restricts, rather arbitrarily, to the analysis of only a subset of revisions, instead of the full, high-resolution history of a project.
Thus, there exist both a need and the potential for representing and analyzing software artifacts more efficiently.

In this thesis, we identify several processes in existing software evolution analysis pipelines that suffer from redundancies and inefficiencies.
We then develop purpose-agnostic solutions for improving these processes and combine them in a generic, reusable, and extensible analysis library, called LISA.
We evaluate our approach extensively by computing (and publishing) code metrics for millions of program revisions, testing its generalizability by supporting multiple types of artifacts, analyses and programming languages, and by applying our tool to conduct concrete code studies.

Our findings indicate that analyzing software evolution using traditional tools incurs significant redundancies.
We demonstrate that the individual techniques we present are generalizable to multiple programming languages and artifact types and that they can accelerate the processing of evolving software by multiple orders of magnitude.
Alongside these core findings, our research has resulted in a state-of-the-art, open-source software analysis library, a large public dataset of historical code metrics, and incremental advancements in understanding the pace of software evolution, developer behavior and the visualization of software evolution.

Statistics

Downloads

3 downloads since deposited on 20 Dec 2019
3 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Dissertation (monographical)
Referees:Gall Harald C, Meyer Bertrand
Communities & Collections:03 Faculty of Economics > Department of Informatics
UZH Dissertations
Dewey Decimal Classification:000 Computer science, knowledge & systems
Language:English
Place of Publication:Zürich
Date:September 2019
Deposited On:20 Dec 2019 13:01
Last Modified:25 Aug 2020 14:44
Number of Pages:226
OA Status:Closed
Other Identification Number:merlin-id:18875

Download

Closed Access: Download allowed only for UZH members

Content: Published Version
Filetype: PDF - Repository staff only until 31 October 2020
Size: 6MB
Embargo till: 2020-10-31