Journal of Systems and Software最新文献

筛选
英文 中文
Data catalog tools: A systematic multivocal literature review 数据目录工具:系统的多语种文献综述
IF 4.1 2区 计算机科学
Journal of Systems and Software Pub Date : 2025-08-05 DOI: 10.1016/j.jss.2025.112584
Marco Tonnarelli , Indika Kumara , Stefan Driessen , Damian Andrew Tamburri , Willem-Jan van den Heuvel , Patrick Oor
{"title":"Data catalog tools: A systematic multivocal literature review","authors":"Marco Tonnarelli ,&nbsp;Indika Kumara ,&nbsp;Stefan Driessen ,&nbsp;Damian Andrew Tamburri ,&nbsp;Willem-Jan van den Heuvel ,&nbsp;Patrick Oor","doi":"10.1016/j.jss.2025.112584","DOIUrl":"10.1016/j.jss.2025.112584","url":null,"abstract":"<div><div>A data catalog enables an organization to maintain an inventory of its data assets by collecting and managing the relevant metadata. We conducted a systematic multi-vocal literature review on data catalogs to understand their features and usage. We systematically selected and analyzed 86 literature sources and 39 catalog tools. We first utilized the findings from the literature to develop a classification framework comprising 24 fine-grained and five high-level features, along with three maturity levels. Next, we analyzed 39 tools based on the classification framework. Organizations typically include a data catalog as a component in their big data platforms and use it to support the various phases of the metadata management lifecycle. Hence, we also mapped the catalog features to the requirements of metadata-driven big data architectures, namely data mesh, data lake, and data lakehouse. Moreover, the mappings of the features to the phases in a metadata management lifecycle were developed. Our findings shall aid organizations in making informed decisions when choosing data catalog tools and help researchers identify the critical research issues in data cataloging and metadata management.</div><div><em>Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board</em>.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112584"},"PeriodicalIF":4.1,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144780881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards higher quality software vulnerability data using LLM-based patch filtering 利用基于llm的补丁过滤技术获得更高质量的软件漏洞数据
IF 4.1 2区 计算机科学
Journal of Systems and Software Pub Date : 2025-07-31 DOI: 10.1016/j.jss.2025.112581
Charlie Dil , Hui Chen , Kostadin Damevski
{"title":"Towards higher quality software vulnerability data using LLM-based patch filtering","authors":"Charlie Dil ,&nbsp;Hui Chen ,&nbsp;Kostadin Damevski","doi":"10.1016/j.jss.2025.112581","DOIUrl":"10.1016/j.jss.2025.112581","url":null,"abstract":"<div><div>High-quality vulnerability patch data is essential for understanding vulnerabilities in software systems. Accurate patch data sheds light on the nature of vulnerabilities, their origins, and effective remediation strategies. However, current data collection efforts prioritize rapid release over quality, leading to patches that are incomplete or contain extraneous changes. In addition to supporting vulnerability analysis, high-quality patch data improves automatic vulnerability prediction models, which require reliable inputs to predict issues in new or existing code.</div><div>In this paper, we explore using large language models (LLMs) to filter vulnerability data by identifying and removing low-quality instances. Trained on large textual corpora including source code, LLMs offer new opportunities to improve data accuracy. Our goal is to leverage LLMs for reasoning-based assessments of whether a code hunk fixes a described vulnerability. We evaluate several prompting strategies and find that Generated Knowledge Prompting, where the model first explains a hunk’s effect, then assesses whether it fixes the bug, is most effective across three LLMs. Applying this filtering to the BigVul dataset, we show a 7%–9% improvement in prediction precision for three popular vulnerability prediction models. Recall declines slightly, 2%–8%, across models, likely reflecting the impact of reduced dataset size.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112581"},"PeriodicalIF":4.1,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144770919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AMACollision: An advanced framework for testing autonomous vehicles based on adversarial multi-agent amaccollision:一种先进的基于对抗多智能体的自动驾驶汽车测试框架
IF 4.1 2区 计算机科学
Journal of Systems and Software Pub Date : 2025-07-29 DOI: 10.1016/j.jss.2025.112578
Tiexin Wang, Shuo Tian, Gulent Asalif Minas, Chunyang Bian
{"title":"AMACollision: An advanced framework for testing autonomous vehicles based on adversarial multi-agent","authors":"Tiexin Wang,&nbsp;Shuo Tian,&nbsp;Gulent Asalif Minas,&nbsp;Chunyang Bian","doi":"10.1016/j.jss.2025.112578","DOIUrl":"10.1016/j.jss.2025.112578","url":null,"abstract":"<div><div>Autonomous driving, as one of the typical safety-critical domains, requires strict safety evaluation. Simulation testing, due to the advantages such as high efficiency and low cost, is an important enabling means of evaluation. Currently, researches toward simulation testing of Autonomous Driving Systems (ADSs) mainly aim at identifying safety-critical driving scenarios. However, the realism of the generated scenarios and the generation efficiency remain as two challenges. Therefore, we propose <em>AMACollision</em>, a multi-agent-based framework for generating highly realistic critical driving scenarios. <em>AMACollision</em> employs a deep neural network that integrates multi-modal sensor fusion and temporal decision-making, enabling robust scene understanding and efficient training of the multi-agent that act as Non-Player Characters (NPCs). To enhance the realism of the generated scenarios, a two-stage reward mechanism, which ensures NPCs’ behaviors comply with traffic regulations, is introduced. To evaluate the performance of <em>AMACollision</em>, we integrate it with a high-fidelity simulator and conduct extensive experiments testing two ADSs on three various road structures. Experimental results, collected from seven metrics, e.g., collision rate, demonstrate that <em>AMACollision</em> outperformed the state-of-the-art method in both the realism of the generated scenarios and the generation efficiency.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112578"},"PeriodicalIF":4.1,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144757451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated spectrum-based model fault localization within a search framework 在搜索框架内基于谱的模型故障自动定位
IF 4.1 2区 计算机科学
Journal of Systems and Software Pub Date : 2025-07-29 DOI: 10.1016/j.jss.2025.112576
Ting Shu , Xinru Xue , Xuesong Yin , Jinsong Xia
{"title":"Automated spectrum-based model fault localization within a search framework","authors":"Ting Shu ,&nbsp;Xinru Xue ,&nbsp;Xuesong Yin ,&nbsp;Jinsong Xia","doi":"10.1016/j.jss.2025.112576","DOIUrl":"10.1016/j.jss.2025.112576","url":null,"abstract":"<div><div>Models play a crucial role in software engineering, guiding development processes and ensuring quality. Extended Finite State Machines (EFSM) effectively model complex systems, but ensuring their correctness is a challenging and critical task. This study aims to improve spectrum-based fault localization accuracy in EFSMs by automating the optimization of risk evaluation formula combinations. We propose a Model Fault Localization Framework (MFLF) that automates the generation, selection, and combination of these formulas. Within this framework, the KWOA-LTR method is developed, employing K-Means clustering to classify formulas based on their fault localization capabilities and the Sum of Squared Errors (SSE) metric to determine the optimal number <span><math><mi>K</mi></math></span> of formulas. The Whale Optimization Algorithm (WOA) is then used to select an optimal combination of risk evaluation formulas. Subsequently, a learning-to-rank algorithm constructs a fault localization ranking model that integrates the suspiciousness scores derived from these formulas. Leveraging this trained model, KWOA-LTR efficiently and precisely locates faults. Extensive empirical evaluations on 10 representative benchmark EFSMs demonstrate that KWOA-LTR improves fault localization precision, stability, and overall performance, outperforming existing methods and reducing the manual effort required. The MFLF framework has the potential to support automated spectrum-based fault localization in model-based systems. Moreover, KWOA-LTR within this framework is competitive in terms of fault localization performance. The code is publicly available at https://github.com/Renee0715/GMFLF.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112576"},"PeriodicalIF":4.1,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144724022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Insights into resource utilization of code small language models serving with runtime engines and execution providers 洞察与运行时引擎和执行提供程序一起服务的代码小语言模型的资源利用
IF 4.1 2区 计算机科学
Journal of Systems and Software Pub Date : 2025-07-28 DOI: 10.1016/j.jss.2025.112574
Francisco Durán , Matias Martinez , Patricia Lago , Silverio Martínez-Fernández
{"title":"Insights into resource utilization of code small language models serving with runtime engines and execution providers","authors":"Francisco Durán ,&nbsp;Matias Martinez ,&nbsp;Patricia Lago ,&nbsp;Silverio Martínez-Fernández","doi":"10.1016/j.jss.2025.112574","DOIUrl":"10.1016/j.jss.2025.112574","url":null,"abstract":"<div><div>The rapid growth of language models, particularly in code generation, requires substantial computational resources, raising concerns about energy consumption and environmental impact. Optimizing language models inference resource utilization is crucial, and Small Language Models (SLMs) offer a promising solution to reduce resource demands. Our goal is to analyze the impact of deep learning serving configurations, defined as combinations of runtime engines and execution providers, on resource utilization, in terms of energy consumption, execution time, and computing-resource utilization from the point of view of software engineers conducting inference in the context of code generation SLMs. We conducted a technology-oriented, multi-stage experimental pipeline using twelve code generation SLMs to investigate energy consumption, execution time, and computing-resource utilization across the configurations. Significant differences emerged across configurations. CUDA execution provider configurations outperformed CPU execution provider configurations in both energy consumption and execution time. Among the configurations, TORCH paired with CUDA demonstrated the greatest energy efficiency, achieving energy savings from 37.99% up to 89.16% compared to other serving configurations. Similarly, optimized runtime engines like ONNX with the CPU execution provider achieved from 8.98% up to 72.04% energy savings within CPU-based configurations. Also, TORCH paired with CUDA exhibited efficient computing-resource utilization. Serving configuration choice significantly impacts resource utilization. While further research is needed, we recommend the above configurations best suited to software engineers’ requirements for enhancing serving resource utilization efficiency.</div><div><em>Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board</em>.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112574"},"PeriodicalIF":4.1,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144766994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating time-dependent methods and seasonal effects in code technical debt prediction 评估代码技术债务预测中的时间依赖方法和季节性影响
IF 4.1 2区 计算机科学
Journal of Systems and Software Pub Date : 2025-07-28 DOI: 10.1016/j.jss.2025.112545
Mikel Robredo , Nyyti Saarimäki , Matteo Esposito , Davide Taibi , Rafael Peñaloza , Valentina Lenarduzzi
{"title":"Evaluating time-dependent methods and seasonal effects in code technical debt prediction","authors":"Mikel Robredo ,&nbsp;Nyyti Saarimäki ,&nbsp;Matteo Esposito ,&nbsp;Davide Taibi ,&nbsp;Rafael Peñaloza ,&nbsp;Valentina Lenarduzzi","doi":"10.1016/j.jss.2025.112545","DOIUrl":"10.1016/j.jss.2025.112545","url":null,"abstract":"<div><h3>Background:</h3><div>Code Technical Debt (Code TD) prediction has gained significant attention in recent software engineering research. However, no standardized approach to Code TD prediction fully captures the factors influencing its evolution.</div></div><div><h3>Objective:</h3><div>Our study aims to assess the impact of time-dependent models and seasonal effects on Code TD prediction. It evaluates such models against widely used Machine Learning models also considering the influence of seasonality on prediction performance.</div></div><div><h3>Methods:</h3><div>We trained 11 prediction models with 31 Java open-source projects. To assess their performance, we predicted future observations of the SQALE index. To evaluate the practical usability of our TD forecasting model and their impact on practitioners, we surveyed 23 software engineering professionals.</div></div><div><h3>Results:</h3><div>Our study confirms the benefits of time-dependent techniques, with the ARIMAX model outperforming the others. Seasonal effects improved predictive performance, though the impact remained modest. ARIMAX/SARIMAX models demonstrated to provide well-balanced long-term forecasts. The survey highlighted strong industry interest in short- to medium-term TD forecasts.</div></div><div><h3>Conclusions:</h3><div>Our findings support using techniques that capture time dependence in historical software metric data, particularly for Code TD. Effectively addressing this evidence requires adopting methods that account for temporal patterns.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112545"},"PeriodicalIF":4.1,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144766828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Overcoming experimentation challenges in software ecosystems of large product and service organizations: A participatory action research study 克服大型产品和服务组织软件生态系统中的实验挑战:参与式行动研究
IF 4.1 2区 计算机科学
Journal of Systems and Software Pub Date : 2025-07-26 DOI: 10.1016/j.jss.2025.112550
Shady Hegazy , Christoph Elsner , Jan Bosch , Helena Holmström-Olsson
{"title":"Overcoming experimentation challenges in software ecosystems of large product and service organizations: A participatory action research study","authors":"Shady Hegazy ,&nbsp;Christoph Elsner ,&nbsp;Jan Bosch ,&nbsp;Helena Holmström-Olsson","doi":"10.1016/j.jss.2025.112550","DOIUrl":"10.1016/j.jss.2025.112550","url":null,"abstract":"<div><div>Software ecosystems facilitate collaborative innovation and value co-creation among diverse actors through shared technological platforms. However, introducing experimentation practices, such as A/B testing, into these ecosystems within large organizations presents significant challenges due to complex structures, network effects, and complicated organizational dynamics. The challenge is more difficult when it comes to product and service organizations, especially in business-to-business (B2B) or industrial domains. This paper presents an action research study, aiming to overcome the barriers to adopting experimentation-based evolution approaches, conducted within a participating large software-intensive product and service organization, with a vast portfolio of software ecosystems in a wide spectrum of business domains.</div><div>Following a participatory action research methodology, the research team worked closely with the participating organization through three iterative cycles of planning, action, observation, and reflection. Data sources included a systematic literature review, expert interviews with 25 participants across 17 software ecosystems, and collaborative workshops with internal stakeholders. The study identifies key organizational, technical, and cultural challenges to introducing experimentation in software ecosystems of large organizations, particularly in business-to-business or industrial domains, and exemplifies a roadmap for iteratively addressing such challenges.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112550"},"PeriodicalIF":4.1,"publicationDate":"2025-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144724021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Modeling approach based on Coloured Petri Nets for Quantum algorithm 基于彩色Petri网的量子算法建模方法
IF 4.1 2区 计算机科学
Journal of Systems and Software Pub Date : 2025-07-26 DOI: 10.1016/j.jss.2025.112567
Shichen Zhou , Xu Guo , Jian-tao Zhou
{"title":"A Modeling approach based on Coloured Petri Nets for Quantum algorithm","authors":"Shichen Zhou ,&nbsp;Xu Guo ,&nbsp;Jian-tao Zhou","doi":"10.1016/j.jss.2025.112567","DOIUrl":"10.1016/j.jss.2025.112567","url":null,"abstract":"<div><div>Verifying the correctness of quantum algorithms and reducing the understanding threshold are important parts of quantum computing. Formal methods, as effective techniques, are increasingly being focused on for these purpose. Given the superposition and entanglement of quantum algorithms, reasonably and intuitively formalizing them is a big challenge. In our work, Coloured Petri Nets (CPN), a sophisticated extension of Petri nets, are harnessed to represent the quantum states in the Hilbert space, the formal approach based on CPN for quantum algorithms is proposed in this paper. Then, the process of constructing a CPN quantum model is demonstrated using the modeling of the Deutsch algorithm, simulations of the algorithm are conducted within the model, with intermediate quantum states being recorded, and the simulation results subsequently analyzed. The quantum properties of Deutsch algorithm such as superposition and entanglement among quantum states can be formalized reasonably, demonstrating the rationality and effectiveness of the model in this paper. Finally, this paper verifies the model’s capability in modeling complex-valued operations of entangled qubits through the construction of a simple circuit.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112567"},"PeriodicalIF":4.1,"publicationDate":"2025-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144780879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying key AI challenges in make-to-order manufacturing organisations: A multiple case study 确定按订单制造组织中的关键人工智能挑战:多案例研究
IF 4.1 2区 计算机科学
Journal of Systems and Software Pub Date : 2025-07-25 DOI: 10.1016/j.jss.2025.112559
Jonatan Flyckt , Tony Gorschek , Daniel Mendez , Niklas Lavesson
{"title":"Identifying key AI challenges in make-to-order manufacturing organisations: A multiple case study","authors":"Jonatan Flyckt ,&nbsp;Tony Gorschek ,&nbsp;Daniel Mendez ,&nbsp;Niklas Lavesson","doi":"10.1016/j.jss.2025.112559","DOIUrl":"10.1016/j.jss.2025.112559","url":null,"abstract":"<div><div>Artificial Intelligence can make manufacturing organisations more effective and efficient, but it is not clear which AI tasks hold the greatest potential. Make-to-order manufacturers must constantly adapt to customers’ unique and rapidly changing needs, and therefore have different challenges than make-to-stock manufacturers. Our ambition is to develop an AI-enabled software system to support manufacturing organisations in improving their processes. To this end, we first seek to understand the data and technology requirements for key AI-enabled tasks in a make-to-order setting and determine the level of performance and explainability needed to address them. We perform a multiple case study of five make-to-order packaging manufacturers, interviewing personnel from sales, production, and supply chain to identify and prioritise operational challenges suitable for AI approaches. Demand forecasting emerges as the most important task, followed by predictive maintenance, quality inspection, complex decision risk estimation, and production planning. Participants emphasise the importance of explainable techniques to ensure trust in the systems. The results highlight a need for a greater control of the production process and a better understanding of customer needs. Although most of the tasks could be solved with current techniques, some, such as intermittent demand forecasting and complex decision risk estimation, would require further development. The study clarifies the potential of AI-enabled systems in make-to-order manufacturing and outlines the steps required to realise it.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112559"},"PeriodicalIF":4.1,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144724018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Piloting Copilot, Codex, and StarCoder2: Hot temperature, cold prompts, or black magic? 驾驶副驾驶,法典,和StarCoder2:高温,冷提示,或黑魔法?
IF 4.1 2区 计算机科学
Journal of Systems and Software Pub Date : 2025-07-25 DOI: 10.1016/j.jss.2025.112562
Jean-Baptiste Döderlein , Nguessan Hermann Kouadio , Mathieu Acher , Djamel Eddine Khelladi , Benoit Combemale
{"title":"Piloting Copilot, Codex, and StarCoder2: Hot temperature, cold prompts, or black magic?","authors":"Jean-Baptiste Döderlein ,&nbsp;Nguessan Hermann Kouadio ,&nbsp;Mathieu Acher ,&nbsp;Djamel Eddine Khelladi ,&nbsp;Benoit Combemale","doi":"10.1016/j.jss.2025.112562","DOIUrl":"10.1016/j.jss.2025.112562","url":null,"abstract":"<div><div>Language models are promising solutions for tackling increasing complex problems. In software engineering, they recently gained attention in code assistants, which generate programs from a natural language task description (prompt). They have the potential to save time and effort but remain poorly understood, limiting their optimal use. In this article, we investigate the impact of input variations on two configurations of a language model, focusing on parameters such as task description, surrounding context, model creativity, and the number of generated solutions. We design specific operators to modify these inputs and apply them to three LLM-based code assistants (Copilot, Codex, StarCoder2) and two benchmarks representing algorithmic problems (HumanEval, LeetCode). Our study examines whether these variations significantly affect program quality and how these effects generalize across models.</div><div>Our results show that varying input parameters can greatly improve performance, achieving up to 79.27% success in one-shot generation compared to 22.44% for Codex and 31.1% for Copilot in default settings. Actioning this potential in practice is challenging due to the complex interplay in our study—the optimal settings for temperature, prompt, and number of generated solutions vary by problem.</div><div>Reproducing our study with StarCoder2 confirms these findings, indicating they are not model-specific. We also uncover surprising behaviors (e.g., fully removing the prompt can be effective), revealing model brittleness and areas for improvement.</div><div>Overall, this work opens opportunities to envision (automated) strategies for enhancing performance of language model-based code assistants, but also questions their reliability and robustness.</div><div><em>Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board</em>.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112562"},"PeriodicalIF":4.1,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144780880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信