{"title":"Usage patterns of software product metrics in assessing developers’ output: A comprehensive study","authors":"Wentao Chen , Huiqun Yu , Guisheng Fan , Zijie Huang , Yuguo Liang","doi":"10.1016/j.infsof.2025.107935","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>Accurate assessment of developers’ output is crucial for both software engineering research and industrial practice. This assessment often relies on software product metrics such as lines of code (LOC) and quality metrics from static analysis tools. However, existing research lacks a comprehensive understanding of the usage patterns of product metrics, and a single metric is increasingly vulnerable to manipulation, particularly with the emergence of large language models (LLMs).</div></div><div><h3>Objectives:</h3><div>This study aims to investigate (1) how developers can intentionally manipulate commonly used metrics like LOC by using LLMs, (2) whether complex efficiency metrics provide consistent advantages over simpler metrics, and (3) the reliability and cost-effectiveness of quality metrics derived from tools such as SonarQube.</div></div><div><h3>Methods:</h3><div>We conduct empirical analyses involving three LLMs to achieve metric manipulation and evaluate product metric reliability across nine open-source projects. We further validate our findings through a collaboration with a large financial institution facing fairness concerns in developers’ output due to inappropriate metric usage.</div></div><div><h3>Results:</h3><div>We observe that developers can inflate LOC by an average of 60.86% using LLMs, leading to anomalous assessments. Complex efficiency metrics do not yield consistent performance improvements relative to their computational costs. Furthermore, quality metrics from SonarQube and PMD often fail to capture real quality changes and are expensive to compute. The software metric migration plan based on our findings effectively reduces evaluation anomalies in the industry and standardizes developers’ commits, confirming our conclusions’ practical validity.</div></div><div><h3>Conclusion:</h3><div>Our findings highlight critical limitations in current metric practices and demonstrate how thoughtful usage patterns of product metrics can improve fairness in developer evaluation. This work bridges the gap between academic insights and industrial needs, offering practical guidance for more reliable developers’ output assessment.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107935"},"PeriodicalIF":4.3000,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925002745","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Context:
Accurate assessment of developers’ output is crucial for both software engineering research and industrial practice. This assessment often relies on software product metrics such as lines of code (LOC) and quality metrics from static analysis tools. However, existing research lacks a comprehensive understanding of the usage patterns of product metrics, and a single metric is increasingly vulnerable to manipulation, particularly with the emergence of large language models (LLMs).
Objectives:
This study aims to investigate (1) how developers can intentionally manipulate commonly used metrics like LOC by using LLMs, (2) whether complex efficiency metrics provide consistent advantages over simpler metrics, and (3) the reliability and cost-effectiveness of quality metrics derived from tools such as SonarQube.
Methods:
We conduct empirical analyses involving three LLMs to achieve metric manipulation and evaluate product metric reliability across nine open-source projects. We further validate our findings through a collaboration with a large financial institution facing fairness concerns in developers’ output due to inappropriate metric usage.
Results:
We observe that developers can inflate LOC by an average of 60.86% using LLMs, leading to anomalous assessments. Complex efficiency metrics do not yield consistent performance improvements relative to their computational costs. Furthermore, quality metrics from SonarQube and PMD often fail to capture real quality changes and are expensive to compute. The software metric migration plan based on our findings effectively reduces evaluation anomalies in the industry and standardizes developers’ commits, confirming our conclusions’ practical validity.
Conclusion:
Our findings highlight critical limitations in current metric practices and demonstrate how thoughtful usage patterns of product metrics can improve fairness in developer evaluation. This work bridges the gap between academic insights and industrial needs, offering practical guidance for more reliable developers’ output assessment.
期刊介绍:
Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include:
• Software management, quality and metrics,
• Software processes,
• Software architecture, modelling, specification, design and programming
• Functional and non-functional software requirements
• Software testing and verification & validation
• Empirical studies of all aspects of engineering and managing software development
Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information.
The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.