Xiangzheng Liu , Jianxun Liu , Guosheng Kang , Min Shi , Yi Liu , Yiming Yin
{"title":"On the effectiveness of large language models for query expansion in code search","authors":"Xiangzheng Liu , Jianxun Liu , Guosheng Kang , Min Shi , Yi Liu , Yiming Yin","doi":"10.1016/j.jss.2025.112582","DOIUrl":"10.1016/j.jss.2025.112582","url":null,"abstract":"<div><div>Language Models (LMs) are deep learning models trained on massive amounts of text data. One of their main advantages is their superior language understanding capabilities. This study explores the application of Large Language Models (LLMs) understanding capabilities in code search query expansion. To this end, we collected a query corpus from multiple data sources and trained multiple LMs (GPT2, BERT) on this query corpus using a self-supervised task. The trained LM models are then used to expand the input query. We evaluate the performance of these LLMs on the CodeSearchNet dataset using two state-of-the-art code search methods (GraphCodeBERT and CoCoSoda) and compare these LLMs with currently popular expansion methods. Experimental results show that LLM-based query expansion methods outperform existing query reformulation methods in most cases.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112582"},"PeriodicalIF":4.1,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144770918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards higher quality software vulnerability data using LLM-based patch filtering","authors":"Charlie Dil , Hui Chen , Kostadin Damevski","doi":"10.1016/j.jss.2025.112581","DOIUrl":"10.1016/j.jss.2025.112581","url":null,"abstract":"<div><div>High-quality vulnerability patch data is essential for understanding vulnerabilities in software systems. Accurate patch data sheds light on the nature of vulnerabilities, their origins, and effective remediation strategies. However, current data collection efforts prioritize rapid release over quality, leading to patches that are incomplete or contain extraneous changes. In addition to supporting vulnerability analysis, high-quality patch data improves automatic vulnerability prediction models, which require reliable inputs to predict issues in new or existing code.</div><div>In this paper, we explore using large language models (LLMs) to filter vulnerability data by identifying and removing low-quality instances. Trained on large textual corpora including source code, LLMs offer new opportunities to improve data accuracy. Our goal is to leverage LLMs for reasoning-based assessments of whether a code hunk fixes a described vulnerability. We evaluate several prompting strategies and find that Generated Knowledge Prompting, where the model first explains a hunk’s effect, then assesses whether it fixes the bug, is most effective across three LLMs. Applying this filtering to the BigVul dataset, we show a 7%–9% improvement in prediction precision for three popular vulnerability prediction models. Recall declines slightly, 2%–8%, across models, likely reflecting the impact of reduced dataset size.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112581"},"PeriodicalIF":4.1,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144770919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AMACollision: An advanced framework for testing autonomous vehicles based on adversarial multi-agent","authors":"Tiexin Wang, Shuo Tian, Gulent Asalif Minas, Chunyang Bian","doi":"10.1016/j.jss.2025.112578","DOIUrl":"10.1016/j.jss.2025.112578","url":null,"abstract":"<div><div>Autonomous driving, as one of the typical safety-critical domains, requires strict safety evaluation. Simulation testing, due to the advantages such as high efficiency and low cost, is an important enabling means of evaluation. Currently, researches toward simulation testing of Autonomous Driving Systems (ADSs) mainly aim at identifying safety-critical driving scenarios. However, the realism of the generated scenarios and the generation efficiency remain as two challenges. Therefore, we propose <em>AMACollision</em>, a multi-agent-based framework for generating highly realistic critical driving scenarios. <em>AMACollision</em> employs a deep neural network that integrates multi-modal sensor fusion and temporal decision-making, enabling robust scene understanding and efficient training of the multi-agent that act as Non-Player Characters (NPCs). To enhance the realism of the generated scenarios, a two-stage reward mechanism, which ensures NPCs’ behaviors comply with traffic regulations, is introduced. To evaluate the performance of <em>AMACollision</em>, we integrate it with a high-fidelity simulator and conduct extensive experiments testing two ADSs on three various road structures. Experimental results, collected from seven metrics, e.g., collision rate, demonstrate that <em>AMACollision</em> outperformed the state-of-the-art method in both the realism of the generated scenarios and the generation efficiency.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112578"},"PeriodicalIF":4.1,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144757451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automated spectrum-based model fault localization within a search framework","authors":"Ting Shu , Xinru Xue , Xuesong Yin , Jinsong Xia","doi":"10.1016/j.jss.2025.112576","DOIUrl":"10.1016/j.jss.2025.112576","url":null,"abstract":"<div><div>Models play a crucial role in software engineering, guiding development processes and ensuring quality. Extended Finite State Machines (EFSM) effectively model complex systems, but ensuring their correctness is a challenging and critical task. This study aims to improve spectrum-based fault localization accuracy in EFSMs by automating the optimization of risk evaluation formula combinations. We propose a Model Fault Localization Framework (MFLF) that automates the generation, selection, and combination of these formulas. Within this framework, the KWOA-LTR method is developed, employing K-Means clustering to classify formulas based on their fault localization capabilities and the Sum of Squared Errors (SSE) metric to determine the optimal number <span><math><mi>K</mi></math></span> of formulas. The Whale Optimization Algorithm (WOA) is then used to select an optimal combination of risk evaluation formulas. Subsequently, a learning-to-rank algorithm constructs a fault localization ranking model that integrates the suspiciousness scores derived from these formulas. Leveraging this trained model, KWOA-LTR efficiently and precisely locates faults. Extensive empirical evaluations on 10 representative benchmark EFSMs demonstrate that KWOA-LTR improves fault localization precision, stability, and overall performance, outperforming existing methods and reducing the manual effort required. The MFLF framework has the potential to support automated spectrum-based fault localization in model-based systems. Moreover, KWOA-LTR within this framework is competitive in terms of fault localization performance. The code is publicly available at https://github.com/Renee0715/GMFLF.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112576"},"PeriodicalIF":4.1,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144724022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francisco Durán , Matias Martinez , Patricia Lago , Silverio Martínez-Fernández
{"title":"Insights into resource utilization of code small language models serving with runtime engines and execution providers","authors":"Francisco Durán , Matias Martinez , Patricia Lago , Silverio Martínez-Fernández","doi":"10.1016/j.jss.2025.112574","DOIUrl":"10.1016/j.jss.2025.112574","url":null,"abstract":"<div><div>The rapid growth of language models, particularly in code generation, requires substantial computational resources, raising concerns about energy consumption and environmental impact. Optimizing language models inference resource utilization is crucial, and Small Language Models (SLMs) offer a promising solution to reduce resource demands. Our goal is to analyze the impact of deep learning serving configurations, defined as combinations of runtime engines and execution providers, on resource utilization, in terms of energy consumption, execution time, and computing-resource utilization from the point of view of software engineers conducting inference in the context of code generation SLMs. We conducted a technology-oriented, multi-stage experimental pipeline using twelve code generation SLMs to investigate energy consumption, execution time, and computing-resource utilization across the configurations. Significant differences emerged across configurations. CUDA execution provider configurations outperformed CPU execution provider configurations in both energy consumption and execution time. Among the configurations, TORCH paired with CUDA demonstrated the greatest energy efficiency, achieving energy savings from 37.99% up to 89.16% compared to other serving configurations. Similarly, optimized runtime engines like ONNX with the CPU execution provider achieved from 8.98% up to 72.04% energy savings within CPU-based configurations. Also, TORCH paired with CUDA exhibited efficient computing-resource utilization. Serving configuration choice significantly impacts resource utilization. While further research is needed, we recommend the above configurations best suited to software engineers’ requirements for enhancing serving resource utilization efficiency.</div><div><em>Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board</em>.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112574"},"PeriodicalIF":4.1,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144766994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mikel Robredo , Nyyti Saarimäki , Matteo Esposito , Davide Taibi , Rafael Peñaloza , Valentina Lenarduzzi
{"title":"Evaluating time-dependent methods and seasonal effects in code technical debt prediction","authors":"Mikel Robredo , Nyyti Saarimäki , Matteo Esposito , Davide Taibi , Rafael Peñaloza , Valentina Lenarduzzi","doi":"10.1016/j.jss.2025.112545","DOIUrl":"10.1016/j.jss.2025.112545","url":null,"abstract":"<div><h3>Background:</h3><div>Code Technical Debt (Code TD) prediction has gained significant attention in recent software engineering research. However, no standardized approach to Code TD prediction fully captures the factors influencing its evolution.</div></div><div><h3>Objective:</h3><div>Our study aims to assess the impact of time-dependent models and seasonal effects on Code TD prediction. It evaluates such models against widely used Machine Learning models also considering the influence of seasonality on prediction performance.</div></div><div><h3>Methods:</h3><div>We trained 11 prediction models with 31 Java open-source projects. To assess their performance, we predicted future observations of the SQALE index. To evaluate the practical usability of our TD forecasting model and their impact on practitioners, we surveyed 23 software engineering professionals.</div></div><div><h3>Results:</h3><div>Our study confirms the benefits of time-dependent techniques, with the ARIMAX model outperforming the others. Seasonal effects improved predictive performance, though the impact remained modest. ARIMAX/SARIMAX models demonstrated to provide well-balanced long-term forecasts. The survey highlighted strong industry interest in short- to medium-term TD forecasts.</div></div><div><h3>Conclusions:</h3><div>Our findings support using techniques that capture time dependence in historical software metric data, particularly for Code TD. Effectively addressing this evidence requires adopting methods that account for temporal patterns.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112545"},"PeriodicalIF":4.1,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144766828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shady Hegazy , Christoph Elsner , Jan Bosch , Helena Holmström-Olsson
{"title":"Overcoming experimentation challenges in software ecosystems of large product and service organizations: A participatory action research study","authors":"Shady Hegazy , Christoph Elsner , Jan Bosch , Helena Holmström-Olsson","doi":"10.1016/j.jss.2025.112550","DOIUrl":"10.1016/j.jss.2025.112550","url":null,"abstract":"<div><div>Software ecosystems facilitate collaborative innovation and value co-creation among diverse actors through shared technological platforms. However, introducing experimentation practices, such as A/B testing, into these ecosystems within large organizations presents significant challenges due to complex structures, network effects, and complicated organizational dynamics. The challenge is more difficult when it comes to product and service organizations, especially in business-to-business (B2B) or industrial domains. This paper presents an action research study, aiming to overcome the barriers to adopting experimentation-based evolution approaches, conducted within a participating large software-intensive product and service organization, with a vast portfolio of software ecosystems in a wide spectrum of business domains.</div><div>Following a participatory action research methodology, the research team worked closely with the participating organization through three iterative cycles of planning, action, observation, and reflection. Data sources included a systematic literature review, expert interviews with 25 participants across 17 software ecosystems, and collaborative workshops with internal stakeholders. The study identifies key organizational, technical, and cultural challenges to introducing experimentation in software ecosystems of large organizations, particularly in business-to-business or industrial domains, and exemplifies a roadmap for iteratively addressing such challenges.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112550"},"PeriodicalIF":4.1,"publicationDate":"2025-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144724021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonatan Flyckt , Tony Gorschek , Daniel Mendez , Niklas Lavesson
{"title":"Identifying key AI challenges in make-to-order manufacturing organisations: A multiple case study","authors":"Jonatan Flyckt , Tony Gorschek , Daniel Mendez , Niklas Lavesson","doi":"10.1016/j.jss.2025.112559","DOIUrl":"10.1016/j.jss.2025.112559","url":null,"abstract":"<div><div>Artificial Intelligence can make manufacturing organisations more effective and efficient, but it is not clear which AI tasks hold the greatest potential. Make-to-order manufacturers must constantly adapt to customers’ unique and rapidly changing needs, and therefore have different challenges than make-to-stock manufacturers. Our ambition is to develop an AI-enabled software system to support manufacturing organisations in improving their processes. To this end, we first seek to understand the data and technology requirements for key AI-enabled tasks in a make-to-order setting and determine the level of performance and explainability needed to address them. We perform a multiple case study of five make-to-order packaging manufacturers, interviewing personnel from sales, production, and supply chain to identify and prioritise operational challenges suitable for AI approaches. Demand forecasting emerges as the most important task, followed by predictive maintenance, quality inspection, complex decision risk estimation, and production planning. Participants emphasise the importance of explainable techniques to ensure trust in the systems. The results highlight a need for a greater control of the production process and a better understanding of customer needs. Although most of the tasks could be solved with current techniques, some, such as intermittent demand forecasting and complex decision risk estimation, would require further development. The study clarifies the potential of AI-enabled systems in make-to-order manufacturing and outlines the steps required to realise it.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112559"},"PeriodicalIF":4.1,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144724018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rodrigo Oliveira Zacarias , Léo Carvalho Ramos Antunes , Márcio de Oliveira Barros , Rodrigo Pereira dos Santos , Patricia Lago
{"title":"Exploring developer experience factors in software ecosystems","authors":"Rodrigo Oliveira Zacarias , Léo Carvalho Ramos Antunes , Márcio de Oliveira Barros , Rodrigo Pereira dos Santos , Patricia Lago","doi":"10.1016/j.jss.2025.112549","DOIUrl":"10.1016/j.jss.2025.112549","url":null,"abstract":"<div><h3>Context:</h3><div>Developer experience (DX) plays a key role in developers’ performance and their continued involvement in a software ecosystem (SECO) platform. While researchers and practitioners have recognized several factors affecting DX in SECO platforms, a clear roadmap of the most influential factors is still missing. This is particularly important given the direct impact on developers’ interest in SECO and their ongoing engagement with the common technological platform.</div></div><div><h3>Goal:</h3><div>This work aims to identify key DX factors and understand how they influence third-party developers’ decisions to adopt and keep contributing to a SECO.</div></div><div><h3>Methods:</h3><div>We conducted a systematic mapping study (SMS), analyzing 29 studies to assess the state-of-the-art of DX in SECO. Additionally, we conducted a Delphi study to evaluate the influence of 27 DX factors (identified in our SMS) from the perspective of 21 third-party developers to adopt and keep contributing to a SECO.</div></div><div><h3>Results:</h3><div>The factors that most strongly influence developers’ adoption and ongoing contributions to a SECO are: “financial costs for using the platform”, “desired technical resources for development”, “low barriers to entry into the applications market”, and “more financial gains”.</div></div><div><h3>Conclusion:</h3><div>DX is essential for the success and sustainability of SECO. Our set of DX factors provides valuable insights and recommendations for researchers and practitioners to address key DX concerns from the perspective of third-party developers.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112549"},"PeriodicalIF":4.1,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144749136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marc Herrmann , Alexander Specht , Abdurrahman Sekerci , Martin Obaidi , Marco Ehl , Duaa Adel Ali Elsofi , Katharina Großer , Jil Klünder , Jan Jürjens , Kurt Schneider
{"title":"From missile warhead to smart fridge: Interviews with industry experts on tracing safety- and security-relevant artifacts","authors":"Marc Herrmann , Alexander Specht , Abdurrahman Sekerci , Martin Obaidi , Marco Ehl , Duaa Adel Ali Elsofi , Katharina Großer , Jil Klünder , Jan Jürjens , Kurt Schneider","doi":"10.1016/j.jss.2025.112551","DOIUrl":"10.1016/j.jss.2025.112551","url":null,"abstract":"<div><div>Ensuring traceability of safety- and security-related artifacts is vital in software development to comply with standards and mitigate risks. Despite its importance, the practical implementation of defining and tracing safety- and security-relevant artifacts remains ambiguous. Based on eight semi-structured interviews with industry experts, this work explores the definitions, methods, processes, and challenges of tracing safety- and security-related artifacts. The interviews revealed that definitions of safety- and security-relevant artifacts are highly context-dependent, shaped by regulatory standards, internal processes, technical characteristics, and practitioner judgment. Rather than signaling a deficiency, this variability reflects the inherently multifaceted nature of safety and security work, where artifact classification emerges from practical reasoning rather than strict or universal criteria. Tools play a key role in supporting traceability, and cross-team alignment remains a concern in practice. Our findings provide actionable insights for organizations seeking to strengthen traceability. The recommendations encourage the development of internal classification criteria, support effective collaboration with external partners, support guidance, onboarding, and training, and help align practices across teams, fostering more reliable and transparent management of safety- and security-relevant artifacts.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112551"},"PeriodicalIF":3.7,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}