Information and Software Technology最新文献_第4页

Does it smell? A homogeneous stacking approach for code smell prediction 有味道吗？一种用于代码气味预测的同构叠加方法

IF 3.8 2区计算机科学

Information and Software Technology Pub Date : 2025-06-11 DOI: 10.1016/j.infsof.2025.107801

Rim El Jammal, Danielle Azar

{"title":"Does it smell? A homogeneous stacking approach for code smell prediction","authors":"Rim El Jammal, Danielle Azar","doi":"10.1016/j.infsof.2025.107801","DOIUrl":"10.1016/j.infsof.2025.107801","url":null,"abstract":"<div><h3>Context:</h3><div>Code smells, defined as detrimental patterns and design choices in software development, significantly impact various aspects of software quality, such as maintainability, reusability, and stability. These harmful effects can disrupt the software development cycle and result in a waste of development and managerial resources.</div></div><div><h3>Objective:</h3><div>Although code smell detection has attracted considerable attention in recent years, the existing literature still shows certain limitations whereby most of the studies have been conducted on small data sets, a small number of code smells at once and evaluated using few performance metrics.</div></div><div><h3>Methods:</h3><div>In this work, we propose a Homogeneous Stacking Classifier to predict the presence of nine different code smells. We resort to feature selection to keep the attributes relevant to each code smell.</div></div><div><h3>Results:</h3><div>We use a large data set of 19,000 instances and we evaluate the performance of our proposed model using eight different metrics comparing it to state-of-the-art machine learning techniques that have proven to perform well in current research.</div></div><div><h3>Conclusion:</h3><div>The proposed approach statistically significantly outperforms the other models across most cases therefore, affirming its efficacy in code smell detection.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107801"},"PeriodicalIF":3.8,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144280980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SVA-ICL: Improving LLM-based software vulnerability assessment via in-context learning and information fusion SVA-ICL：通过上下文学习和信息融合改进基于llm的软件漏洞评估

IF 3.8 2区计算机科学

Information and Software Technology Pub Date : 2025-06-11 DOI: 10.1016/j.infsof.2025.107803

Chaoyang Gao , Xiang Chen , Guangbei Zhang

{"title":"SVA-ICL: Improving LLM-based software vulnerability assessment via in-context learning and information fusion","authors":"Chaoyang Gao , Xiang Chen , Guangbei Zhang","doi":"10.1016/j.infsof.2025.107803","DOIUrl":"10.1016/j.infsof.2025.107803","url":null,"abstract":"<div><h3>Context:</h3><div>Software vulnerability assessment (SVA) is critical for identifying, evaluating, and prioritizing security weaknesses in software applications.</div></div><div><h3>Objective:</h3><div>Despite the increasing application of large language models (LLMs) in various software engineering tasks, their effectiveness in SVA remains underexplored.</div></div><div><h3>Method:</h3><div>To address this gap, we introduce a novel approach SVA-ICL, which leverages in-context learning (ICL) to enhance LLM performance. Our approach involves the selection of high-quality demonstrations for ICL through information fusion, incorporating both source code and vulnerability descriptions. For source code, we consider semantic, lexical, and syntactic similarities, while for vulnerability descriptions, we focus on textual similarity. Based on the selected demonstrations, we construct context prompts and consider DeepSeek-V2 as the LLM for SVA-ICL.</div></div><div><h3>Results:</h3><div>We evaluate the effectiveness of SVA-ICL using a large-scale dataset comprising 12,071 C/C++ vulnerabilities. Experimental results demonstrate that SVA-ICL outperforms state-of-the-art SVA baselines in terms of Accuracy, F1-score, and MCC measures. Furthermore, ablation studies highlight the significance of component customization in SVA-ICL, such as the number of demonstrations, the demonstration ordering strategy, and the optimal fusion ratio of different modalities.</div></div><div><h3>Conclusion:</h3><div>Our findings suggest that leveraging ICL with information fusion can effectively improve the effectiveness of LLM-based SVA, warranting further research in this direction.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107803"},"PeriodicalIF":3.8,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144262650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Mining-Software-Repository study on deprecated API usages in open-source Java software applications 开源Java软件应用程序中废弃API用法的挖掘软件存储库研究

IF 3.8 2区计算机科学

Information and Software Technology Pub Date : 2025-06-07 DOI: 10.1016/j.infsof.2025.107782

Pietro Cassieri, Simone Romano, Giuseppe Scanniello

{"title":"A Mining-Software-Repository study on deprecated API usages in open-source Java software applications","authors":"Pietro Cassieri, Simone Romano, Giuseppe Scanniello","doi":"10.1016/j.infsof.2025.107782","DOIUrl":"10.1016/j.infsof.2025.107782","url":null,"abstract":"<div><h3>Context:</h3><div>A deprecated <em>API</em> (<em>Application Programming Interface</em>) is an API that its original developers no longer recommend using. Although deprecated APIs (<em>i.e.,</em> deprecated fields, methods, and classes) are still implemented, they are likely to be removed in future implementations. Consequently, developers are advised against using deprecated APIs in newly written code and are encouraged to update existing code to remove any deprecated API usage.</div></div><div><h3>Objective:</h3><div>We aimed to gather preliminary empirical evidence on deprecated API usages in open-source Java applications.</div></div><div><h3>Methods:</h3><div>To pursue such a goal, we conducted an exploratory <em>Mining-Software-Repository</em> (<em>MSR</em>) study in which we quantitatively analyzed the commit histories of 14 applications whose software projects were top-starred on GitHub.</div></div><div><h3>Results:</h3><div>The most important takeaway results of our study can be summarized as follows: <em>(i)</em> deprecated API usages are pretty widespread in the studied software applications; <em>(ii)</em> only in half of these applications, developers remove deprecated API usages as soon as possible; <em>(iii)</em> consuming their own deprecated APIs is a prevalent phenomenon in half of the studied applications; <em>(iv)</em> the introductions and removals of deprecated API usages are mostly due to changes performed by senior contributors; <em>(v)</em> developers mostly introduce and remove deprecated API usages when they are far from publishing a release version; and <em>(vi)</em> the introductions and removals of deprecated API usages are often undocumented in commit messages.</div></div><div><h3>Conclusion:</h3><div>The outcomes of our study suggest that developers should better handle deprecated API usages.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107782"},"PeriodicalIF":3.8,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144254386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A multi-year grey literature review on AI-assisted test automation 人工智能辅助测试自动化的多年灰色文献综述

IF 3.8 2区计算机科学

Information and Software Technology Pub Date : 2025-06-06 DOI: 10.1016/j.infsof.2025.107799

Filippo Ricca , Alessandro Marchetto , Andrea Stocco

{"title":"A multi-year grey literature review on AI-assisted test automation","authors":"Filippo Ricca , Alessandro Marchetto , Andrea Stocco","doi":"10.1016/j.infsof.2025.107799","DOIUrl":"10.1016/j.infsof.2025.107799","url":null,"abstract":"<div><h3>Context:</h3><div>Test Automation (TA) techniques are crucial for quality assurance in software engineering but face limitations such as high test suite maintenance costs and the need for extensive programming skills. Artificial Intelligence (AI) offers new opportunities to address these issues through automation and improved practices.</div></div><div><h3>Objective:</h3><div>Given the prevalent usage of AI in industry, sources of truth are held in grey literature as well as the minds of professionals, stakeholders, developers, and end-users. To this aim, our study surveys grey literature to explore how AI is adopted in TA, focusing on the problems it solves, its solutions, and the available tools. Additionally, the study is complemented by expert insights.</div></div><div><h3>Methods:</h3><div>Over five years, we reviewed over 3,600 grey literature sources, including blogs, white papers, and user manuals, and finally filtered 342 documents to develop taxonomies of TA problems and AI solutions. We also cataloged 100 AI-driven TA tools and interviewed five expert software testers to gain insights into AI’s current and future role in TA.</div></div><div><h3>Results:</h3><div>The study found that manual test code development and maintenance are the main challenges in TA. In contrast, automated test generation and self-healing test scripts are the most common AI solutions. We identified 100 AI-based TA tools, with Applitools, Testim, Functionize, AccelQ, and Mabl being the most adopted in practice.</div></div><div><h3>Conclusion:</h3><div>This paper offers a detailed overview of AI’s impact on TA through grey literature analysis and expert interviews. It presents new taxonomies of TA problems and AI solutions, provides a catalog of AI-driven tools, and relates solutions to problems and tools to solutions. Interview insights further revealed the state and future potential of AI in TA. Our findings support practitioners in selecting TA tools and guide future research.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107799"},"PeriodicalIF":3.8,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144297130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Trust requirements in sociotechnical systems: A systematic literature review 社会技术系统中的信任需求：系统的文献回顾

IF 3.8 2区计算机科学

Information and Software Technology Pub Date : 2025-06-05 DOI: 10.1016/j.infsof.2025.107796

Geicianfran Roque , José Nascimento , Rafael Souza , Carina Alves , João Araújo

{"title":"Trust requirements in sociotechnical systems: A systematic literature review","authors":"Geicianfran Roque , José Nascimento , Rafael Souza , Carina Alves , João Araújo","doi":"10.1016/j.infsof.2025.107796","DOIUrl":"10.1016/j.infsof.2025.107796","url":null,"abstract":"<div><h3>Context:</h3><div>Trust in Sociotechnical Systems (STS) has emerged as an essential requirement in everyday interactions, with notable relevance in fostering user acceptance of novel technologies. The growing dependence on automated and interactive systems underlines the importance of understanding the trust requirements in these systems in depth.</div></div><div><h3>Objective:</h3><div>This study aims to synthesize primary studies that address trust requirements in sociotechnical systems to identify key definitions, approaches, tools, processes, and application domains.</div></div><div><h3>Methods:</h3><div>We conducted a Systematic Literature Review (SLR) to answer four research questions. We searched studies from four databases: IEEE Xplore, ACM Digital Library, Scopus, and Springer. In addition, we performed snowballing to complement the automatic search.</div></div><div><h3>Results:</h3><div>We reviewed 42 primary studies. Our analysis indicates that the definition of trust requirements in sociotechnical systems is multifaceted and context-dependent. We observed that trust is influenced by factors such as data security, transparency in interactions, and the reliability of systems. We identified different approaches to specify trust requirements in sociotechnical systems, such as agent-oriented modeling, goal-oriented modeling, and ontological modeling.</div></div><div><h3>Conclusion:</h3><div>The study highlights the crucial need for more research that adopts holistic and interdisciplinary approaches to address trust requirements in several domains and popular areas such as Artificial Intelligence (AI). The findings suggest the importance of developing approaches based on conceptual models to address users’ trust requirements effectively.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107796"},"PeriodicalIF":3.8,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144280981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A model-driven architecture approach for recovering microservice architectures: Defining and evaluating MiSAR 用于恢复微服务架构的模型驱动架构方法：定义和评估MiSAR

IF 3.8 2区计算机科学

Information and Software Technology Pub Date : 2025-06-03 DOI: 10.1016/j.infsof.2025.107808

Nuha Alshuqayran , Nour Ali , Roger Evans

{"title":"A model-driven architecture approach for recovering microservice architectures: Defining and evaluating MiSAR","authors":"Nuha Alshuqayran , Nour Ali , Roger Evans","doi":"10.1016/j.infsof.2025.107808","DOIUrl":"10.1016/j.infsof.2025.107808","url":null,"abstract":"<div><h3>Context</h3><div>Microservice architecture is an architectural style in modern software systems, characterized by small, independent services called microservices. This architecture is ideal to facilitate rapid feature deployment. However, it presents a challenge for software engineers, who often lack a comprehensive architectural view due to the distributed nature and complex interdependencies of microservices.</div></div><div><h3>Objective</h3><div>This paper presents a Model Driven Architecture approach for MicroService Architecture Recovery called MiSAR. Building on previous work that defined a Platform Independent Metamodel, this study seeks to extend this metamodel, introduce a Platform Specific Metamodel, and establish mapping rules. The goal is to enable the semi-automatic recovery of architectural models for microservice systems.</div></div><div><h3>Methods</h3><div>An empirical study was conducted on nine microservice systems to define MiSAR’s artefacts and support semiautomatic recovery of architectural models. These artefacts are then implemented and used to semi-automatically recover the architectures of three systems. The effectiveness of MiSAR is evaluated based on metrics such as recall, precision, and F-measure, to assess the recovered models against actual architectures. We also compared the recovered architectural models with the ones documented by the developers.</div></div><div><h3>Results</h3><div>The study identified key requirements for the Platform Independent Metamodel to support comprehensive microservice architecture recovery, leading to an incremental extension of the MiSAR Platform Independent Metamodel. Mapping rules were established to effectively transform Platform Specific Models into Platform Independent ones. Furthermore, MiSAR was successfully implemented to recover architecture models. An evaluation using three systems demonstrated that MiSAR could recover architectural models with a high degree of completeness and correctness when compared with the actual architecture.</div></div><div><h3>Conclusion</h3><div>The MiSAR artefacts, including the extended Platform Independent Metamodel and mapping rules, effectively produce expressive architectural models of microservice systems. Systems confirmed MiSAR’s ability to semi-automatically recover accurate architectural models, providing a holistic view often missing in current software engineering practices.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107808"},"PeriodicalIF":3.8,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144270062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A survey of coverage-guided greybox fuzzing with deep neural models 基于深度神经模型的覆盖引导灰盒模糊分析综述

IF 3.8 2区计算机科学

Information and Software Technology Pub Date : 2025-06-02 DOI: 10.1016/j.infsof.2025.107797

Junyang Qiu , Yupeng Jiang , Yuantian Miao , Wei Luo , Lei Pan , Xi Zheng

{"title":"A survey of coverage-guided greybox fuzzing with deep neural models","authors":"Junyang Qiu , Yupeng Jiang , Yuantian Miao , Wei Luo , Lei Pan , Xi Zheng","doi":"10.1016/j.infsof.2025.107797","DOIUrl":"10.1016/j.infsof.2025.107797","url":null,"abstract":"<div><div>Coverage-guided greybox fuzzing (CGF) has emerged as a powerful technique for software vulnerability detection, yet traditional techniques often struggle with the increasing complexity of modern software systems and the vastness of input spaces. Deep neural networks (DNNs) have begun to fundamentally transform CGF by addressing these limitations through automated feature extraction, adaptive input generation, and intelligent path prioritization. However, despite these advancements, critical gaps persist in understanding the state-of-the-art landscape. Existing studies often lack rigorous benchmarks to evaluate scalability and generalizability, fail to address the interpretability of neural-guided decisions, and overlook the integration of emerging paradigms such as large language models (LLMs) and neurosymbolic reasoning. This survey systematically bridges these gaps by providing a comprehensive taxonomy of DNN-driven CGF techniques, analyzing their strengths and limitations across key fuzzing stages—seed generation, selection, and mutation. We find that although DNNs have significantly improved fuzzing efficiency, challenges such as semantically invalid seeds, high computational overhead, and limited cross-domain adaptability remain unresolved. Most importantly, we identify two transformative directions with the potential to redefine CGF: (1) <strong>LLM-powered fuzzing</strong>, which combines generative AI with domain-specific fine-tuning to produce context-aware inputs; and (2) <strong>neurosymbolic integration</strong>, which merges the precision of symbolic execution with the scalability of neural networks to tackle path explosion. By synthesizing these insights, this survey not only clarifies the state-of-the-art but also outlines a roadmap for developing robust, explainable, and widely applicable intelligent fuzzers. The future of CGF lies in hybrid models that integrate data-driven learning with formal methods, paving the way for autonomous vulnerability discovery in an era of increasingly complex software systems.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107797"},"PeriodicalIF":3.8,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144230286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Translating code with Large Language Models and human-in-the-loop feedback 使用大型语言模型和人在循环反馈翻译代码

IF 3.8 2区计算机科学

Information and Software Technology Pub Date : 2025-05-31 DOI: 10.1016/j.infsof.2025.107785

Gabriele Dario De Siano, Anna Rita Fasolino, Giancarlo Sperlí, Andrea Vignali

{"title":"Translating code with Large Language Models and human-in-the-loop feedback","authors":"Gabriele Dario De Siano, Anna Rita Fasolino, Giancarlo Sperlí, Andrea Vignali","doi":"10.1016/j.infsof.2025.107785","DOIUrl":"10.1016/j.infsof.2025.107785","url":null,"abstract":"<div><h3>Context:</h3><div>In recent years, the code translation task has arisen as one of the major software issues in maintaining software quality during migration over complex infrastructure. This task involves human subjects with different background knowledge and could introduce errors due to the semantic gap between the programming languages and the complexity of the task. Generative Artificial Intelligence (AI) showed good capabilities in code generation, albeit this is highly dependent on the human factor.</div></div><div><h3>Objective:</h3><div>This paper investigates, from the human perspective, the use of three Generative AI tools (ChatGPT, Google Bard, and GitHub Copilot) in the context of translation tasks from code written in query languages to code written in framework-specific code languages, specifically focused on SQL dialects and PySpark. This translation is especially crucial during the migration from centralized architectures to cloud-based architectures.</div></div><div><h3>Methods:</h3><div>We evaluate the usefulness of these tools, the quality of the generated code, and their impact on performance. The models are tested with queries of various type in three different SQL dialects considering three usage scenarios of increasing complexity. It involves 15 participants with diverse programming backgrounds, who aim to solve tasks by interacting multiple times with the tools and manually changing the code.</div></div><div><h3>Results:</h3><div>The findings show a positive performance, demonstrating their reliability in generating coherent translations, achieving 100% precision in most tasks with a slight decrease in more complex scenarios, and producing well-documented code, with a response time of under 2 min, with Google Bard responding 50% faster than the others.</div></div><div><h3>Conclusion:</h3><div>In conclusion, this paper establishes a methodology and both quantitative and qualitative metrics for evaluating how generative AI tools streamline code translation, shifting the emphasis from production to refinement. It underscores the importance of continuously improving these tools to integrate them into developers’ workflows and to provide guidelines for intelligent use.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107785"},"PeriodicalIF":3.8,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144254385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An empirical study on architectural smells through a pipeline for continuous technical debt assessment 通过管道对架构气味进行持续技术债务评估的实证研究

IF 3.8 2区计算机科学

Information and Software Technology Pub Date : 2025-05-30 DOI: 10.1016/j.infsof.2025.107783

Matteo Bochicchio , Darius Sas , Alessandro Gilardi , Francesca Arcelli Fontana

{"title":"An empirical study on architectural smells through a pipeline for continuous technical debt assessment","authors":"Matteo Bochicchio , Darius Sas , Alessandro Gilardi , Francesca Arcelli Fontana","doi":"10.1016/j.infsof.2025.107783","DOIUrl":"10.1016/j.infsof.2025.107783","url":null,"abstract":"<div><h3>Context:</h3><div>Architectural smells, are a well-known indicator of architectural technical debt, their presence could have a great impact on the maintainability and evolvability of a project. Hence, it is important to carefully study and monitor them.</div></div><div><h3>Objective:</h3><div>In this paper, we describe an empirical study on the analysis of the correlations existing between architectural smells and co-changes, with the aim of getting further insights into how architectural smells can influence maintenance efforts.</div></div><div><h3>Method:</h3><div>Using the Goal-Question-Metric approach, we compared pairs of files affected by smells with clean ones to determine if smelly pairs co-change more frequently. To collect the data, we exploit a new data collection pipeline based on Apache Airflow to generate large-scale, up-to-date datasets with static analysis tools. For the current study, the pipeline uses <span>Arcan 2</span>, a static analysis tool for architectural smell detection.</div></div><div><h3>Results:</h3><div>The empirical study, conducted on a set of projects analyzed by the pipeline, found that the median Co-change rate in smelly (both files affected) and mixed (one file affected) pairs was higher than in clean pairs. Moreover, the Co-change rate of the smelly pairs is higher than that of the mixed ones. This result became more significant as the lines of code increased.</div></div><div><h3>Conclusion:</h3><div>The empirical study found that architectural smells are linked to higher Co-change rates in affected files, leading to increased maintenance efforts for developers. Moreover, the results highlight the value of the pipeline data and offer useful insights for managing architectural technical debt.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"185 ","pages":"Article 107783"},"PeriodicalIF":3.8,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144194439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Assessing output reliability and similarity of large language models in software development: A comparative case study approach 评估软件开发中大型语言模型的输出可靠性和相似性：一种比较案例研究方法

IF 3.8 2区计算机科学

Information and Software Technology Pub Date : 2025-05-29 DOI: 10.1016/j.infsof.2025.107787

Dae-Kyoo Kim , Hua Ming

{"title":"Assessing output reliability and similarity of large language models in software development: A comparative case study approach","authors":"Dae-Kyoo Kim , Hua Ming","doi":"10.1016/j.infsof.2025.107787","DOIUrl":"10.1016/j.infsof.2025.107787","url":null,"abstract":"<div><h3>Context:</h3><div>Generative large language models (LLMs) are increasingly used across various activities in software development, offering significant potential to enhance productivity. However, there is a lack of systematic study examining the reliability and similarity of the outputs from these models.</div></div><div><h3>Objective:</h3><div>This work presents a comparative analysis of the reliability – defined as the consistency and correctness of software artifacts – and similarity of LLM outputs in software development.</div></div><div><h3>Method:</h3><div>To accomplish the objective, we introduce a structured approach for assessing the reliability and similarity of outputs from five prominent LLMs – ChatGPT, Claude, Copilot, Gemini, and Meta – and apply it within two case studies focused on developing a food order and delivery system and a smart wallet system.</div></div><div><h3>Results:</h3><div>The study found that the overall output reliability of the models is rated at 0.82 with Claude outperforming other models at 0.92, followed by ChatGPT at 0.90, Copilot at 0.80, Meta at 0.75, and Gemini at 0.71. The models demonstrated an overall 57% similarity and 43% variability in their outputs, highlighting the uniqueness of models.</div></div><div><h3>Conclusions:</h3><div>While overall, LLMs exhibit decent reliability in their outputs with varying degrees, they still require human oversight and review of their outputs before implementation. LLMs present unique characteristics that practitioners should consider before adoption.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"185 ","pages":"Article 107787"},"PeriodicalIF":3.8,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144169085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0