{"title":"An empirical study on the impact of code duplication-aware refactoring practices on quality metrics","authors":"Eman Abdullah AlOmar","doi":"10.1016/j.infsof.2025.107687","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>Code refactoring is widely recognized as an essential software engineering practice that improves the understandability and maintainability of source code. Several studies attempted to detect refactoring activities through mining software repositories, allowing one to collect, analyze, and get actionable data-driven insights about refactoring practices within software projects.</div></div><div><h3>Objective:</h3><div>Our goal is to identify, among the various quality models presented in the literature, the ones that align with the developer’s vision of eliminating duplicates of code, when they explicitly mention that they refactor the code to improve them.</div></div><div><h3>Method:</h3><div>We extract a corpus of 332 refactoring commits applied and documented by developers during their daily changes from 128 open-source Java projects. In particular, we extract 32 structural metrics from which we identify code duplicate removal commits with their corresponding refactoring operations, as perceived by software engineers. Thereafter, we empirically analyze the impact of these refactoring operations on a set of common state-of-the-art design quality metrics.</div></div><div><h3>Results:</h3><div>The statistical analysis of the results obtained shows that (i) some state-of-the-art metrics are capable of capturing the developer’s intention of removing code duplication; and (ii) some metrics are being more emphasized than others. We confirm that various structural metrics can effectively represent code duplication, leading to different impacts on software quality. Some metrics contribute to improvements, while others may lead to degradation.</div></div><div><h3>Conclusion:</h3><div>Most of the mapped metrics associated with the main quality attributes successfully capture developers’ intentions for removing code duplicates, as is evident from the commit messages. However, certain metrics do not fully capture these intentions.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"182 ","pages":"Article 107687"},"PeriodicalIF":4.3000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925000266","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Context:
Code refactoring is widely recognized as an essential software engineering practice that improves the understandability and maintainability of source code. Several studies attempted to detect refactoring activities through mining software repositories, allowing one to collect, analyze, and get actionable data-driven insights about refactoring practices within software projects.
Objective:
Our goal is to identify, among the various quality models presented in the literature, the ones that align with the developer’s vision of eliminating duplicates of code, when they explicitly mention that they refactor the code to improve them.
Method:
We extract a corpus of 332 refactoring commits applied and documented by developers during their daily changes from 128 open-source Java projects. In particular, we extract 32 structural metrics from which we identify code duplicate removal commits with their corresponding refactoring operations, as perceived by software engineers. Thereafter, we empirically analyze the impact of these refactoring operations on a set of common state-of-the-art design quality metrics.
Results:
The statistical analysis of the results obtained shows that (i) some state-of-the-art metrics are capable of capturing the developer’s intention of removing code duplication; and (ii) some metrics are being more emphasized than others. We confirm that various structural metrics can effectively represent code duplication, leading to different impacts on software quality. Some metrics contribute to improvements, while others may lead to degradation.
Conclusion:
Most of the mapped metrics associated with the main quality attributes successfully capture developers’ intentions for removing code duplicates, as is evident from the commit messages. However, certain metrics do not fully capture these intentions.
期刊介绍:
Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include:
• Software management, quality and metrics,
• Software processes,
• Software architecture, modelling, specification, design and programming
• Functional and non-functional software requirements
• Software testing and verification & validation
• Empirical studies of all aspects of engineering and managing software development
Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information.
The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.