Learning a Classifier for Prediction of Maintainability Based on Static Analysis Tools

2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC) Pub Date : 2019-05-25 DOI:10.1109/ICPC.2019.00043

Markus Schnappinger, Mohd Hafeez Osman, A. Pretschner, Arnaud Fietzke

{"title":"Learning a Classifier for Prediction of Maintainability Based on Static Analysis Tools","authors":"Markus Schnappinger, Mohd Hafeez Osman, A. Pretschner, Arnaud Fietzke","doi":"10.1109/ICPC.2019.00043","DOIUrl":null,"url":null,"abstract":"Static Code Analysis Tools are a popular aid to monitor and control the quality of software systems. Still, these tools only provide a large number of measurements that have to be interpreted by the developers in order to obtain insights about the actual quality of the software. In cooperation with professional quality analysts, we manually inspected source code from three different projects and evaluated its maintainability. We then trained machine learning algorithms to predict the human maintainability evaluation of program classes based on code metrics. The code metrics include structural metrics such as nesting depth, cloning information and abstractions like the number of code smells. We evaluated this approach on a dataset of more than 115,000 Lines of Code. Our model is able to predict up to 81% of the threefold labels correctly and achieves a precision of 80%. Thus, we believe this is a promising contribution towards automated maintainability prediction. In addition, we analyzed the attributes in our created dataset and identified the features with the highest predictive power, i.e. code clones, method length, and the number of alerts raised by the tool Teamscale. This insight provides valuable help for users needing to prioritize tool measurements.","PeriodicalId":6853,"journal":{"name":"2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC)","volume":"83 1","pages":"243-248"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPC.2019.00043","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

Abstract

Static Code Analysis Tools are a popular aid to monitor and control the quality of software systems. Still, these tools only provide a large number of measurements that have to be interpreted by the developers in order to obtain insights about the actual quality of the software. In cooperation with professional quality analysts, we manually inspected source code from three different projects and evaluated its maintainability. We then trained machine learning algorithms to predict the human maintainability evaluation of program classes based on code metrics. The code metrics include structural metrics such as nesting depth, cloning information and abstractions like the number of code smells. We evaluated this approach on a dataset of more than 115,000 Lines of Code. Our model is able to predict up to 81% of the threefold labels correctly and achieves a precision of 80%. Thus, we believe this is a promising contribution towards automated maintainability prediction. In addition, we analyzed the attributes in our created dataset and identified the features with the highest predictive power, i.e. code clones, method length, and the number of alerts raised by the tool Teamscale. This insight provides valuable help for users needing to prioritize tool measurements.

查看原文本刊更多论文

基于静态分析工具的可维护性预测分类器学习

静态代码分析工具是监视和控制软件系统质量的常用工具。尽管如此，这些工具只提供了大量的测量，这些测量必须由开发人员解释，以便获得关于软件实际质量的见解。在与专业质量分析人员的合作下，我们手动检查了来自三个不同项目的源代码，并评估了其可维护性。然后，我们训练机器学习算法来预测基于代码度量的程序类的人类可维护性评估。代码度量包括结构度量，如嵌套深度、克隆信息和抽象，如代码气味的数量。我们在超过115,000行代码的数据集上评估了这种方法。我们的模型能够正确预测高达81%的三重标签，并达到80%的精度。因此，我们相信这是对自动化可维护性预测的一个有希望的贡献。此外，我们分析了我们创建的数据集中的属性，并确定了具有最高预测能力的特征，即代码克隆、方法长度和工具Teamscale提出的警报数量。这种见解为需要优先考虑工具度量的用户提供了有价值的帮助。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC)

自引率

0.00%

发文量