Reducing redundancies in multi-revision code analysis

2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER) Pub Date : 2017-02-01 DOI:10.1109/SANER.2017.7884617

Carol V. Alexandru, Sebastiano Panichella, H. Gall

{"title":"Reducing redundancies in multi-revision code analysis","authors":"Carol V. Alexandru, Sebastiano Panichella, H. Gall","doi":"10.1109/SANER.2017.7884617","DOIUrl":null,"url":null,"abstract":"Software engineering research often requires analyzing multiple revisions of several software projects, be it to make and test predictions or to observe and identify patterns in how software evolves. However, code analysis tools are almost exclusively designed for the analysis of one specific version of the code, and the time and resources requirements grow linearly with each additional revision to be analyzed. Thus, code studies often observe a relatively small number of revisions and projects. Furthermore, each programming ecosystem provides dedicated tools, hence researchers typically only analyze code of one language, even when researching topics that should generalize to other ecosystems. To alleviate these issues, frameworks and models have been developed to combine analysis tools or automate the analysis of multiple revisions, but little research has gone into actually removing redundancies in multi-revision, multi-language code analysis. We present a novel end-to-end approach that systematically avoids redundancies every step of the way: when reading sources from version control, during parsing, in the internal code representation, and during the actual analysis. We evaluate our open-source implementation, LISA, on the full history of 300 projects, written in 3 different programming languages, computing basic code metrics for over 1.1 million program revisions. When analyzing many revisions, LISA requires less than a second on average to compute basic code metrics for all files in a single revision, even for projects consisting of millions of lines of code.","PeriodicalId":6541,"journal":{"name":"2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"2 1","pages":"148-159"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SANER.2017.7884617","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

Software engineering research often requires analyzing multiple revisions of several software projects, be it to make and test predictions or to observe and identify patterns in how software evolves. However, code analysis tools are almost exclusively designed for the analysis of one specific version of the code, and the time and resources requirements grow linearly with each additional revision to be analyzed. Thus, code studies often observe a relatively small number of revisions and projects. Furthermore, each programming ecosystem provides dedicated tools, hence researchers typically only analyze code of one language, even when researching topics that should generalize to other ecosystems. To alleviate these issues, frameworks and models have been developed to combine analysis tools or automate the analysis of multiple revisions, but little research has gone into actually removing redundancies in multi-revision, multi-language code analysis. We present a novel end-to-end approach that systematically avoids redundancies every step of the way: when reading sources from version control, during parsing, in the internal code representation, and during the actual analysis. We evaluate our open-source implementation, LISA, on the full history of 300 projects, written in 3 different programming languages, computing basic code metrics for over 1.1 million program revisions. When analyzing many revisions, LISA requires less than a second on average to compute basic code metrics for all files in a single revision, even for projects consisting of millions of lines of code.

查看原文本刊更多论文

减少多版本代码分析中的冗余

软件工程研究通常需要分析几个软件项目的多个版本，无论是做出和测试预测，还是观察和识别软件发展的模式。然而，代码分析工具几乎是专门为分析一个特定版本的代码而设计的，并且时间和资源需求随着要分析的每个额外的修订而线性增长。因此，代码研究通常只观察相对较少的修订和项目。此外，每个编程生态系统都提供专用工具，因此研究人员通常只分析一种语言的代码，即使研究的主题应该推广到其他生态系统。为了缓解这些问题，已经开发了框架和模型来组合分析工具或自动分析多个版本，但是很少有研究实际消除多版本、多语言代码分析中的冗余。我们提出了一种新颖的端到端方法，系统地避免了每一步的冗余:当从版本控制中读取源代码时，在解析期间，在内部代码表示中，以及在实际分析期间。我们评估了我们的开源实现，LISA，在300个项目的完整历史上，用3种不同的编程语言编写，计算了超过110万次程序修订的基本代码指标。在分析许多版本时，LISA平均只需要不到一秒钟的时间来计算单个版本中所有文件的基本代码度量，即使对于包含数百万行代码的项目也是如此。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)

自引率

0.00%

发文量