Information and Software Technology最新文献

An extensible, feature-based framework for fine-grained code quality assessment 用于细粒度代码质量评估的可扩展的、基于特性的框架

IF 4.3 2区计算机科学

Information and Software Technology Pub Date : 2025-10-24 DOI: 10.1016/j.infsof.2025.107934

Tewfik Ziadi , Karim Ghallab , Zaak Chalal

{"title":"An extensible, feature-based framework for fine-grained code quality assessment","authors":"Tewfik Ziadi , Karim Ghallab , Zaak Chalal","doi":"10.1016/j.infsof.2025.107934","DOIUrl":"10.1016/j.infsof.2025.107934","url":null,"abstract":"<div><h3>Context:</h3><div>Assessing code quality is essential for maintaining and evolving software systems. While traditional tools like SonarQube and Snyk offer valuable insights at the application level, they lack support for feature-specific analysis, making it difficult to understand how quality issues are distributed across the functional structure of a system.</div></div><div><h3>Objectives:</h3><div>This paper introduces I<span>nsight</span>M<span>apper</span>, a novel approach designed to bridge this gap by enabling feature-oriented quality analysis. The goal is to assess and compare the quality of individual features, identify feature-level hotspots, and support strategic maintenance decisions.</div></div><div><h3>Methods:</h3><div>I<span>nsight</span>M<span>apper</span> leverages existing feature location techniques to project quality analysis results onto feature implementations. We evaluate the approach on three case studies, including a recognized benchmark in the feature location domain. These evaluations demonstrate I<span>nsight</span>M<span>apper</span>’s ability to perform fine-grained, feature-oriented code quality assessment using results from SonarQube and Snyk.</div></div><div><h3>Results:</h3><div>The study shows that I<span>nsight</span>M<span>apper</span> effectively reveals how quality issues are distributed across features, uncovers features with disproportionate technical debt, and supports prioritization strategies grounded in functional relevance. The approach also enables the computation of feature-level quality scores, facilitating comparisons between analyzers and across features.</div></div><div><h3>Conclusion:</h3><div>I<span>nsight</span>M<span>apper</span> offers an extensible and practical solution for feature-oriented quality assessment. By projecting analysis results onto the feature of applications, it enhances the interpretability of quality data and paves the way for more targeted maintenance and evolution strategies.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107934"},"PeriodicalIF":4.3,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145363920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LocVul: Line-level vulnerability localization based on a Sequence-to-Sequence approach LocVul：基于序列到序列方法的行级漏洞定位

IF 4.3 2区计算机科学

Information and Software Technology Pub Date : 2025-10-24 DOI: 10.1016/j.infsof.2025.107940

Ilias Kalouptsoglou , Miltiadis Siavvas , Apostolos Ampatzoglou , Dionysios Kehagias , Alexander Chatzigeorgiou

{"title":"LocVul: Line-level vulnerability localization based on a Sequence-to-Sequence approach","authors":"Ilias Kalouptsoglou , Miltiadis Siavvas , Apostolos Ampatzoglou , Dionysios Kehagias , Alexander Chatzigeorgiou","doi":"10.1016/j.infsof.2025.107940","DOIUrl":"10.1016/j.infsof.2025.107940","url":null,"abstract":"<div><h3>Context:</h3><div>The development of secure software systems depends on early and accurate vulnerability identification. Manual inspection is a time-consuming process that requires specialized knowledge. Therefore, as software complexity grows, automated solutions become essential. Vulnerability Prediction (VP) is an emerging mechanism that identifies whether software components contain vulnerabilities, commonly using Machine Learning models trained on classifying components as vulnerable or clean. Recent explainability-based approaches attempt to rank the lines based on their influence on the output of the VP Models (VPMs). However, challenges remain in accurately localizing the vulnerable lines.</div></div><div><h3>Objective:</h3><div>This study aims to examine an alternative to explainability-based approaches to overcome their shortcomings. Specifically, explainability-based methods depend on the type and accuracy of the file or function-level VPMs, inherit possible misleading patterns, and cannot indicate the exact code snippet that is vulnerable nor the number of vulnerable lines.</div></div><div><h3>Method:</h3><div>To address these limitations, this study introduces an innovative approach based on fine-tuning Large Language Models on a Sequence-to-Sequence objective to directly return the vulnerable lines of a given function. The method is evaluated on the Big-Vul dataset to assess its capacity for fine-grained vulnerability detection.</div></div><div><h3>Results:</h3><div>The results demonstrate that the proposed method significantly outperforms the explainability-based baseline both in terms of accuracy and cost-effectiveness.</div></div><div><h3>Conclusions:</h3><div>The proposed approach marks a significant advancement in automated vulnerability detection by enabling accurate line-level localization of vulnerabilities.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107940"},"PeriodicalIF":4.3,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145363922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MHP-RCA: Multivariate Hawkes Process-based Root Cause Analysis in microservice systems 微服务系统中基于多变量Hawkes过程的根本原因分析

IF 4.3 2区计算机科学

Information and Software Technology Pub Date : 2025-10-24 DOI: 10.1016/j.infsof.2025.107938

Zekun Zhang , Jian Wang , Bing Li , Yu Liu , Hongyue Wu , Patrick C.K. Hung

{"title":"MHP-RCA: Multivariate Hawkes Process-based Root Cause Analysis in microservice systems","authors":"Zekun Zhang , Jian Wang , Bing Li , Yu Liu , Hongyue Wu , Patrick C.K. Hung","doi":"10.1016/j.infsof.2025.107938","DOIUrl":"10.1016/j.infsof.2025.107938","url":null,"abstract":"<div><h3>Context:</h3><div>Recent years have witnessed a prevailing trend of developing applications using microservice architectures. Microservice systems typically involve multiple containers that share resources on a single physical host, thereby complicating the interdependencies among microservices. This complexity significantly hinders the identification of root causes of performance issues.</div></div><div><h3>Objective:</h3><div>Performance issues can manifest in various forms. Existing approaches often overlook other potential failure indicators, such as process anomalies that are discernible in audit logs. This paper aims to refine the granularity of root cause analysis to the process level.</div></div><div><h3>Methods:</h3><div>This paper proposes a novel approach called MHP-RCA (Multivariate Hawkes Process-based Root Cause Analysis), which integrates diverse data types, including metrics and audit logs, to localize the root cause in microservice systems. MHP-RCA generates anomalous events from the observable data, then leverages the multivariate Hawkes process to construct causal graphs for effective root cause identification.</div></div><div><h3>Results:</h3><div>Extensive experiments, involving the injection of various anomalies into four widely used open-source benchmarks, demonstrate that MHP-RCA surpasses multiple baseline methods in most cases. Compared to the best-performing baseline approach, MHP-RCA achieves an average overall improvement of 2.5% in AC@1 and 3.7% in AC@5.</div></div><div><h3>Conclusion:</h3><div>The proposed method MHP-RCA, which considers audit logs and metrics, can localize the root cause of microservice anomalies at the process level.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107938"},"PeriodicalIF":4.3,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145363923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ISVMT: An approach of indicator systems validation based on metamorphic testing and data mutation ISVMT：一种基于变质测试和数据突变的指标系统验证方法

IF 4.3 2区计算机科学

Information and Software Technology Pub Date : 2025-10-24 DOI: 10.1016/j.infsof.2025.107944

GuoHao Ma , Bo Yang , XiaoKai Xia , Luo Xu

{"title":"ISVMT: An approach of indicator systems validation based on metamorphic testing and data mutation","authors":"GuoHao Ma , Bo Yang , XiaoKai Xia , Luo Xu","doi":"10.1016/j.infsof.2025.107944","DOIUrl":"10.1016/j.infsof.2025.107944","url":null,"abstract":"<div><h3>Context:</h3><div>Indicator systems are pivotal in assessing software quality, conducting economic analysis, and various other domains. However, their validation is frequently hindered by the absence of explicit expected outputs, posing a significant challenge to their reliability and effectiveness.</div></div><div><h3>Objective:</h3><div>This paper aims to address this challenge by proposing a novel validation method for indicator systems based on metamorphic testing (MT). The method seeks to eliminate the oracle problemby generating follow-up test cases and verifying their consistency through designed logical constraints, known as metamorphic relations, between inputs and outputs.</div></div><div><h3>Methods:</h3><div>To validate the proposed method, we conducted empirical research on three diverse cases: software quality evaluation, red wine quality assessment, and ecological and economic benefit evaluation. We designed metamorphic relations specific to each case and applied our method to identify inconsistencies and potential errors in the indicator systems.</div></div><div><h3>Results:</h3><div>Our experiments successfully identified errors in real-world indicator systems, demonstrating the method’s capability to detect flaws that traditional methods might overlook. Furthermore, we performed mutation testing, achieving an average mutation score of 0.83 on the mutation dataset, which significantly outperforms the traditional statistical analysis method with a maximum score of 0.65.</div></div><div><h3>Conclusion:</h3><div>This paper presents a generalized solution for the validation of indicator systems lacking clear expectations, offering substantial application value in the fields of software engineering and decision support. The proposed method not only increases confidence in the reliability of indicator systems but also opens up promising research avenues for further improving the accuracy and efficiency of validation processes in complex domains.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107944"},"PeriodicalIF":4.3,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145363918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Keys4BR: Key sentences-based model fine-tuning for better semantic representation of bug reports key4br：基于关键句子的模型微调，以更好地表示bug报告的语义

IF 4.3 2区计算机科学

Information and Software Technology Pub Date : 2025-10-23 DOI: 10.1016/j.infsof.2025.107943

Mengjiao Wang , Biyu Cai , Weiqin Zou, Jingxuan Zhang

{"title":"Keys4BR: Key sentences-based model fine-tuning for better semantic representation of bug reports","authors":"Mengjiao Wang , Biyu Cai , Weiqin Zou, Jingxuan Zhang","doi":"10.1016/j.infsof.2025.107943","DOIUrl":"10.1016/j.infsof.2025.107943","url":null,"abstract":"<div><h3>Context:</h3><div>Large language models have been increasingly applied to semantic representation of bug reports due to their deep understanding of natural language. Fine-tuning large language models using bug report text is a common practice to enable models to learn domain-specific knowledge. However, the varying quality of the bug reports can introduce noise, leading to poor performance in downstream tasks.</div></div><div><h3>Objective:</h3><div>To improve the quality of semantic representation for bug reports, we propose Keys4BR, a key sentences-based model fine-tuning for better semantic representation of bug reports.</div></div><div><h3>Method:</h3><div>Specifically, we use keywords that help accurately localize bugs as anchors, designing and applying a key sentences selection strategy to choose portions of the text containing these keywords as the key information. Then we select the lightweight fine-tuning approach to fine-tune the large language model.</div></div><div><h3>Results:</h3><div>Experiments on bug reports from five open-source projects demonstrate that Keys4BR significantly improves the performance of four downstream tasks. The results indicate that Keys4BR achieves superior semantic representation of bug reports compared to the VSM model, the model pre-trained on the general corpus, and the model fine-tuned on original bug reports, with an average F1 score improvement of 9%, 9%, and 6%, respectively. Additionally, we further validate the effectiveness of the key sentences selection and fine-tuning strategies.</div></div><div><h3>Conclusion:</h3><div>Keys4BR can effectively extract key semantic information from bug reports, thereby enhancing the representation capability and generalization performance of large language models in bug report management tasks.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107943"},"PeriodicalIF":4.3,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145363844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automatic test case generation using natural language processing: A systematic mapping study 使用自然语言处理的自动测试用例生成：系统的映射研究

IF 4.3 2区计算机科学

Information and Software Technology Pub Date : 2025-10-22 DOI: 10.1016/j.infsof.2025.107929

Jordy Navarro, Ronald Ibarra

{"title":"Automatic test case generation using natural language processing: A systematic mapping study","authors":"Jordy Navarro, Ronald Ibarra","doi":"10.1016/j.infsof.2025.107929","DOIUrl":"10.1016/j.infsof.2025.107929","url":null,"abstract":"<div><h3>Context:</h3><div>Artificial intelligence (AI) has made significant progress in recent years, which has motivated its use in many disciplines and industrial domains, including software engineering, especially in the testing process, where many research efforts have been made. These studies focus on the automatic test case generation using natural language processing (NLP), an emerging branch of AI. Despite these efforts, the literature lacks a structured and systematic approach, since reported mappings and systematic literature reviews have limitations in their scope.</div></div><div><h3>Objective:</h3><div>This study aims to systematically organize and synthesize the existing literature to establish the state of the art in the automatic generation of test cases using NLP.</div></div><div><h3>Methodology:</h3><div>We conducted systematic mapping following Kai Petersen’s methodology, exploring five databases. The initial search yielded 1262 articles, of which 61 were selected. 16 thematic questions and 4 non-thematic questions were posed.</div></div><div><h3>Results:</h3><div>The findings reveal an increase in the number of articles published in journals starting in 2022. Among the most reported NLP techniques are POS tagging, dependency parsing and tokenization, implemented with tools such as Stanford Core NLP and NLTK. The reported approaches mostly achieved a medium level of automation, using natural and formal language requirements as main inputs. Only 9 articles explicitly mention the use of test case design techniques, such as boundary value analysis, equivalent class partitioning, state transition and decision tables.</div></div><div><h3>Conclusions:</h3><div>We systematically identified and organized the reported primary studies on the automatic or semi-automatic generation of software test cases applying NLP.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107929"},"PeriodicalIF":4.3,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145363845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An inquiry into the neutrality of search engines: Developing tools and indicators to compare bias on socially sensitive topics 对搜索引擎中立性的调查：开发工具和指标来比较对社会敏感话题的偏见

IF 4.3 2区计算机科学

Information and Software Technology Pub Date : 2025-10-18 DOI: 10.1016/j.infsof.2025.107925

Romain Badouard , Inna Lyubareva , Patrick Maillé , Bruno Tuffin

{"title":"An inquiry into the neutrality of search engines: Developing tools and indicators to compare bias on socially sensitive topics","authors":"Romain Badouard , Inna Lyubareva , Patrick Maillé , Bruno Tuffin","doi":"10.1016/j.infsof.2025.107925","DOIUrl":"10.1016/j.infsof.2025.107925","url":null,"abstract":"<div><div>The digital transformation has profoundly reshaped information consumption, with search engines playing a critical role in determining user access to diverse media. Through algorithmic processes, these engines influence content visibility and aggregate news sources, thereby shaping public opinion. As gatekeepers of information, search engines impact the visibility of media outlets, affecting online traffic, revenue, and journalistic diversity. In the context of breaking news and societal issues, search engines facilitate the rapid dissemination of information, often shaping initial narratives. Understanding their role is essential for promoting transparency and ensuring access to a broad spectrum of information. Focusing on movements against police violence, this paper employs an original software tool, initially built to detect bias in search engines, to conduct a comparative analysis across 12 search engines for the terms “Black Lives Matter” and “Justice pour Adama”. Our innovative methodology uncovers biases in information diversity, offering valuable insights into the dynamics that influence the visibility of societal issues.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107925"},"PeriodicalIF":4.3,"publicationDate":"2025-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145363846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Trade-offs between fairness and performance in educational AI: Analyzing post-processing bias mitigation on the OULAD 教育人工智能中公平性与绩效之间的权衡：分析基于OULAD的后处理偏见缓解

IF 4.3 2区计算机科学

Information and Software Technology Pub Date : 2025-10-17 DOI: 10.1016/j.infsof.2025.107933

Sachini Gunasekara, Mirka Saarela

{"title":"Trade-offs between fairness and performance in educational AI: Analyzing post-processing bias mitigation on the OULAD","authors":"Sachini Gunasekara, Mirka Saarela","doi":"10.1016/j.infsof.2025.107933","DOIUrl":"10.1016/j.infsof.2025.107933","url":null,"abstract":"<div><h3>Context:</h3><div>AI-driven educational tools often face a trade-off between fairness and performance, particularly when addressing biases across sensitive demographic attributes. While fairness metrics have been developed to monitor and mitigate bias, optimizing all of these metrics simultaneously is mathematically infeasible, and adjustments to fairness often result in a decrease in overall system performance.</div></div><div><h3>Objective:</h3><div>This study investigates the trade-off between predictive performance and fairness in educational AI systems, focusing on <em>gender</em> and <em>disability</em> as sensitive attributes. We evaluate whether post-processing fairness interventions can mitigate group-level disparities while preserving model usability.</div></div><div><h3>Method:</h3><div>Using the Open University Learning Analytics Dataset, we trained four machine learning models to predict student outcomes. We applied the equalized odds post-processing technique to mitigate bias and assessed model performance with accuracy, F1-score, and AUC, alongside fairness metrics including statistical parity difference (SPD) and equal opportunity difference (EOD). Statistical significance of changes was tested using the Wilcoxon signed-rank test.</div></div><div><h3>Results:</h3><div>All models achieved strong baseline predictive performance, with RF performing best overall. However, systematic disparities were evident, particularly for students with disabilities, showing that high accuracy does not necessarily ensure equitable outcomes. Post-processing reduced group-level disparities substantially, with SPD and EOD values approaching zero, though accuracy and F1-scores decreased slightly but significantly. RF and ANN were more resilient to fairness adjustments.</div></div><div><h3>Conclusion:</h3><div>This study highlights the importance of fairness-aware machine learning, such as post-processing interventions, and suggests that appropriate mitigation methods should be used to ensure benefits are distributed equitably across diverse learners, without favoring any particular fairness metric.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107933"},"PeriodicalIF":4.3,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145322330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Investigating the adoption and maintenance of web GUI testing: Insights from GitHub repositories 调查web GUI测试的采用和维护：来自GitHub存储库的见解

IF 4.3 2区计算机科学

Information and Software Technology Pub Date : 2025-10-17 DOI: 10.1016/j.infsof.2025.107928

Sergio Di Meglio , Luigi Libero Lucio Starace , Valeria Pontillo , Ruben Opdebeeck , Coen De Roover , Sergio Di Martino

{"title":"Investigating the adoption and maintenance of web GUI testing: Insights from GitHub repositories","authors":"Sergio Di Meglio , Luigi Libero Lucio Starace , Valeria Pontillo , Ruben Opdebeeck , Coen De Roover , Sergio Di Martino","doi":"10.1016/j.infsof.2025.107928","DOIUrl":"10.1016/j.infsof.2025.107928","url":null,"abstract":"<div><h3>Context:</h3><div>Web GUI testing is a quality assessment practice aimed at evaluating the functionality of web applications from the perspective of its end users. While prior studies have explored the technical challenges of automated Web GUI testing, fewer works have explored how this practice is applied in real-world web apps.</div></div><div><h3>Objective:</h3><div>This study aims to investigate the adoption, characteristics, and maintenance of automated web GUI testing practices in open-source web applications, focusing on identifying trends and providing actionable insights for researchers and practitioners.</div></div><div><h3>Method:</h3><div>We conducted a large-scale empirical analysis of 472 web applications on the GitHub platform, developed in <span>Java</span>, <span>JavaScript</span>, <span>Python</span>, and <span>TypeScript</span>. These projects use popular browser automation frameworks like <span>Selenium</span>, <span>Playwright</span>, <span>Cypress</span>, and <span>Puppeteer</span>. The study involved examining project characteristics and analyzing the co-evolution and maintenance of automated web GUI tests over time.</div></div><div><h3>Result:</h3><div>Our findings empirically document automated web GUI testing adoption patterns in open-source projects, providing insights into the practical drivers behind both initial framework adoption and migration between different testing frameworks. Projects incorporating these tests generally show higher community engagement and consistent maintenance efforts. The analysis reveals that Web GUI tests tend to co-evolve with the underlying applications, reflecting their integration into the development lifecycle.</div></div><div><h3>Conclusion:</h3><div>The study provides valuable insights into the prevalence and maintenance of Web GUI testing, highlighting practical implications for improving testing practices. Our findings can guide further research on the matter and support practitioners in enhancing their testing strategies.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107928"},"PeriodicalIF":4.3,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145363916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Usage patterns of software product metrics in assessing developers’ output: A comprehensive study 评估开发人员产出的软件产品度量的使用模式：一项综合研究

IF 4.3 2区计算机科学

Information and Software Technology Pub Date : 2025-10-17 DOI: 10.1016/j.infsof.2025.107935

Wentao Chen , Huiqun Yu , Guisheng Fan , Zijie Huang , Yuguo Liang

{"title":"Usage patterns of software product metrics in assessing developers’ output: A comprehensive study","authors":"Wentao Chen , Huiqun Yu , Guisheng Fan , Zijie Huang , Yuguo Liang","doi":"10.1016/j.infsof.2025.107935","DOIUrl":"10.1016/j.infsof.2025.107935","url":null,"abstract":"<div><h3>Context:</h3><div>Accurate assessment of developers’ output is crucial for both software engineering research and industrial practice. This assessment often relies on software product metrics such as lines of code (LOC) and quality metrics from static analysis tools. However, existing research lacks a comprehensive understanding of the usage patterns of product metrics, and a single metric is increasingly vulnerable to manipulation, particularly with the emergence of large language models (LLMs).</div></div><div><h3>Objectives:</h3><div>This study aims to investigate (1) how developers can intentionally manipulate commonly used metrics like LOC by using LLMs, (2) whether complex efficiency metrics provide consistent advantages over simpler metrics, and (3) the reliability and cost-effectiveness of quality metrics derived from tools such as SonarQube.</div></div><div><h3>Methods:</h3><div>We conduct empirical analyses involving three LLMs to achieve metric manipulation and evaluate product metric reliability across nine open-source projects. We further validate our findings through a collaboration with a large financial institution facing fairness concerns in developers’ output due to inappropriate metric usage.</div></div><div><h3>Results:</h3><div>We observe that developers can inflate LOC by an average of 60.86% using LLMs, leading to anomalous assessments. Complex efficiency metrics do not yield consistent performance improvements relative to their computational costs. Furthermore, quality metrics from SonarQube and PMD often fail to capture real quality changes and are expensive to compute. The software metric migration plan based on our findings effectively reduces evaluation anomalies in the industry and standardizes developers’ commits, confirming our conclusions’ practical validity.</div></div><div><h3>Conclusion:</h3><div>Our findings highlight critical limitations in current metric practices and demonstrate how thoughtful usage patterns of product metrics can improve fairness in developer evaluation. This work bridges the gap between academic insights and industrial needs, offering practical guidance for more reliable developers’ output assessment.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107935"},"PeriodicalIF":4.3,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145363919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0