An empirical study on the impact of code duplication-aware refactoring practices on quality metrics

IF 4.3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology Pub Date : 2025-03-18 DOI:10.1016/j.infsof.2025.107687

Eman Abdullah AlOmar

{"title":"An empirical study on the impact of code duplication-aware refactoring practices on quality metrics","authors":"Eman Abdullah AlOmar","doi":"10.1016/j.infsof.2025.107687","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>Code refactoring is widely recognized as an essential software engineering practice that improves the understandability and maintainability of source code. Several studies attempted to detect refactoring activities through mining software repositories, allowing one to collect, analyze, and get actionable data-driven insights about refactoring practices within software projects.</div></div><div><h3>Objective:</h3><div>Our goal is to identify, among the various quality models presented in the literature, the ones that align with the developer’s vision of eliminating duplicates of code, when they explicitly mention that they refactor the code to improve them.</div></div><div><h3>Method:</h3><div>We extract a corpus of 332 refactoring commits applied and documented by developers during their daily changes from 128 open-source Java projects. In particular, we extract 32 structural metrics from which we identify code duplicate removal commits with their corresponding refactoring operations, as perceived by software engineers. Thereafter, we empirically analyze the impact of these refactoring operations on a set of common state-of-the-art design quality metrics.</div></div><div><h3>Results:</h3><div>The statistical analysis of the results obtained shows that (i) some state-of-the-art metrics are capable of capturing the developer’s intention of removing code duplication; and (ii) some metrics are being more emphasized than others. We confirm that various structural metrics can effectively represent code duplication, leading to different impacts on software quality. Some metrics contribute to improvements, while others may lead to degradation.</div></div><div><h3>Conclusion:</h3><div>Most of the mapped metrics associated with the main quality attributes successfully capture developers’ intentions for removing code duplicates, as is evident from the commit messages. However, certain metrics do not fully capture these intentions.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"182 ","pages":"Article 107687"},"PeriodicalIF":4.3000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925000266","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Context:

Code refactoring is widely recognized as an essential software engineering practice that improves the understandability and maintainability of source code. Several studies attempted to detect refactoring activities through mining software repositories, allowing one to collect, analyze, and get actionable data-driven insights about refactoring practices within software projects.

Objective:

Our goal is to identify, among the various quality models presented in the literature, the ones that align with the developer’s vision of eliminating duplicates of code, when they explicitly mention that they refactor the code to improve them.

Method:

We extract a corpus of 332 refactoring commits applied and documented by developers during their daily changes from 128 open-source Java projects. In particular, we extract 32 structural metrics from which we identify code duplicate removal commits with their corresponding refactoring operations, as perceived by software engineers. Thereafter, we empirically analyze the impact of these refactoring operations on a set of common state-of-the-art design quality metrics.

Results:

The statistical analysis of the results obtained shows that (i) some state-of-the-art metrics are capable of capturing the developer’s intention of removing code duplication; and (ii) some metrics are being more emphasized than others. We confirm that various structural metrics can effectively represent code duplication, leading to different impacts on software quality. Some metrics contribute to improvements, while others may lead to degradation.

Conclusion:

Most of the mapped metrics associated with the main quality attributes successfully capture developers’ intentions for removing code duplicates, as is evident from the commit messages. However, certain metrics do not fully capture these intentions.

查看原文本刊更多论文

对代码重复感知重构实践对质量度量影响的实证研究

上下文：代码重构被广泛认为是一种重要的软件工程实践，它可以提高源代码的可理解性和可维护性。一些研究试图通过挖掘软件存储库来检测重构活动，允许人们收集、分析并获得关于软件项目中重构实践的可操作数据驱动的见解。目标：我们的目标是在文献中提出的各种质量模型中，识别那些与开发人员消除代码重复的愿景一致的模型，当他们明确地提到他们重构代码以改进它们时。方法：我们从128个开源Java项目中提取了332个重构提交的语料，这些重构提交是由开发人员在日常更改中应用和记录的。特别地，我们提取了32个结构指标，从中我们识别代码重复删除提交及其相应的重构操作，正如软件工程师所感知的那样。然后，我们根据经验分析这些重构操作对一组常见的最先进的设计质量度量的影响。结果：所获得的结果的统计分析显示：(i)一些最先进的指标能够捕获开发人员删除代码重复的意图；（ii）某些指标比其他指标更受重视。我们确认各种结构度量可以有效地表示代码复制，从而对软件质量产生不同的影响。一些度量有助于改进，而另一些则可能导致退化。结论：大多数与主要质量属性相关的映射度量成功地捕获了开发人员删除代码重复的意图，这从提交消息中可以明显看出。然而，某些指标并不能完全捕捉到这些意图。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information and Software Technology 工程技术-计算机：软件工程

CiteScore

9.10

自引率

7.70%

发文量

164

审稿时长

9.6 weeks

期刊介绍： Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include: • Software management, quality and metrics, • Software processes, • Software architecture, modelling, specification, design and programming • Functional and non-functional software requirements • Software testing and verification & validation • Empirical studies of all aspects of engineering and managing software development Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information. The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.