{"title":"Phylogenetic Analysis of Reticulate Software Evolution","authors":"A. Mori, M. Hashimoto","doi":"10.1109/MSR59073.2023.00074","DOIUrl":null,"url":null,"abstract":"In this paper, we apply techniques from phylogenetics for uncovering evolutionary dependencies among software versions. Phylogenetics is a part of computational molecular biology that addresses the inference of evolution among organisms based on differences/similarities in DNA sequences and morphology. We apply a tree differencing technique to abstract syntax trees to calculate a distance matrix, which is then used by a distance-based phylogenetic algorithm to infer an evolution network. Such a network allows us to identify merging and branching among versions without manually looking into the details of the source code. Experiments on ancient versions of the Emacs editor and the open source 3D printer firmware show that we can reproduce the evolution of the software and identify code import/merging across different lineages. We also discuss how the techniques identify the feature models among software variations. To the best of our knowledge, this paper is the first to report on a reticulate phylogenetic analysis of the software. It may offer a helpful method for gaining information on the evolution of the software.","PeriodicalId":317960,"journal":{"name":"2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSR59073.2023.00074","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we apply techniques from phylogenetics for uncovering evolutionary dependencies among software versions. Phylogenetics is a part of computational molecular biology that addresses the inference of evolution among organisms based on differences/similarities in DNA sequences and morphology. We apply a tree differencing technique to abstract syntax trees to calculate a distance matrix, which is then used by a distance-based phylogenetic algorithm to infer an evolution network. Such a network allows us to identify merging and branching among versions without manually looking into the details of the source code. Experiments on ancient versions of the Emacs editor and the open source 3D printer firmware show that we can reproduce the evolution of the software and identify code import/merging across different lineages. We also discuss how the techniques identify the feature models among software variations. To the best of our knowledge, this paper is the first to report on a reticulate phylogenetic analysis of the software. It may offer a helpful method for gaining information on the evolution of the software.