How do Trivial Refactorings Affect Classification Prediction Models?

Proceedings of the 16th Brazilian Symposium on Software Components, Architectures, and Reuse Pub Date : 2022-10-03 DOI:10.1145/3559712.3559720

Darwin Pinheiro, C. Bezerra, Anderson G. Uchôa

{"title":"How do Trivial Refactorings Affect Classification Prediction Models?","authors":"Darwin Pinheiro, C. Bezerra, Anderson G. Uchôa","doi":"10.1145/3559712.3559720","DOIUrl":null,"url":null,"abstract":"Refactoring is defined as a transformation that changes the internal structure of the source code without changing the external behavior. Keeping the external behavior means that after applying the refactoring activity, the software must produce the same output as before the activity. The refactoring activity can bring several benefits, such as: removing code with low structural quality, avoiding or reducing technical debt, improving code maintainability, reuse or readability. In this way, the benefits extend to internal and external quality attributes. The literature on software refactoring suggests carrying out studies that invest in improving automated solutions for detecting and correcting refactoring. Furthermore, few studies investigate the influence that a less complex type of refactoring can have on predicting more complex refactorings. This paper investigates how less complex (trivial) refactorings affect the prediction of more complex (non-trivial) refactorings. To do this, we classify refactorings based on their triviality, extract metrics from the code, contextualize the data and train machine learning algorithms to investigate the effect caused. Our results suggest that: (i) machine learning with tree-based models (Random Forest and Decision Tree) performed very well when trained with code metrics to detect refactorings; (ii) separating trivial from non-trivial refactorings into different classes resulted in a more efficient model, indicative of improving the accuracy of automated solutions based on machine learning; and, (iii) using balancing techniques that increase or decrease samples randomly is not the best strategy to improve datasets composed of code metrics.","PeriodicalId":119656,"journal":{"name":"Proceedings of the 16th Brazilian Symposium on Software Components, Architectures, and Reuse","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th Brazilian Symposium on Software Components, Architectures, and Reuse","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3559712.3559720","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Refactoring is defined as a transformation that changes the internal structure of the source code without changing the external behavior. Keeping the external behavior means that after applying the refactoring activity, the software must produce the same output as before the activity. The refactoring activity can bring several benefits, such as: removing code with low structural quality, avoiding or reducing technical debt, improving code maintainability, reuse or readability. In this way, the benefits extend to internal and external quality attributes. The literature on software refactoring suggests carrying out studies that invest in improving automated solutions for detecting and correcting refactoring. Furthermore, few studies investigate the influence that a less complex type of refactoring can have on predicting more complex refactorings. This paper investigates how less complex (trivial) refactorings affect the prediction of more complex (non-trivial) refactorings. To do this, we classify refactorings based on their triviality, extract metrics from the code, contextualize the data and train machine learning algorithms to investigate the effect caused. Our results suggest that: (i) machine learning with tree-based models (Random Forest and Decision Tree) performed very well when trained with code metrics to detect refactorings; (ii) separating trivial from non-trivial refactorings into different classes resulted in a more efficient model, indicative of improving the accuracy of automated solutions based on machine learning; and, (iii) using balancing techniques that increase or decrease samples randomly is not the best strategy to improve datasets composed of code metrics.

查看原文本刊更多论文

琐碎的重构如何影响分类预测模型?

重构被定义为在不改变外部行为的情况下改变源代码的内部结构的转换。保持外部行为意味着在应用重构活动之后，软件必须产生与活动之前相同的输出。重构活动可以带来一些好处，例如:删除结构质量较低的代码，避免或减少技术债务，提高代码的可维护性、重用性或可读性。通过这种方式，收益扩展到内部和外部质量属性。关于软件重构的文献建议进行研究，投资于改进用于检测和纠正重构的自动化解决方案。此外，很少有研究调查不太复杂的重构类型对预测更复杂的重构的影响。本文研究了不太复杂(琐碎)的重构如何影响对更复杂(非琐碎)重构的预测。为此，我们根据重构的琐碎程度对其进行分类，从代码中提取指标，将数据置于上下文环境中，并训练机器学习算法来调查所造成的影响。我们的研究结果表明:(i)使用基于树的模型(随机森林和决策树)的机器学习在使用代码度量进行训练以检测重构时表现非常好;(ii)将琐碎重构与非琐碎重构分离为不同的类，从而产生更有效的模型，这表明基于机器学习的自动化解决方案的准确性得到了提高;并且，(iii)使用随机增加或减少样本的平衡技术并不是改进由代码度量组成的数据集的最佳策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 16th Brazilian Symposium on Software Components, Architectures, and Reuse

自引率

0.00%

发文量