Bakheet Aljedaani , Aakash Ahmad , Mahdi Fehmideh , Arif Ali Khan , Jun Shen
{"title":"An exploration study on developing blockchain systems–the practitioners' perspective","authors":"Bakheet Aljedaani , Aakash Ahmad , Mahdi Fehmideh , Arif Ali Khan , Jun Shen","doi":"10.1016/j.infsof.2025.107825","DOIUrl":"10.1016/j.infsof.2025.107825","url":null,"abstract":"<div><h3>Context</h3><div>Blockchain-based software (BBS) builds upon the foundational technologies of cryptocurrencies like Bitcoin, utilising decentralised, immutable ledgers, to support the development and operation of security-critical and transaction-intensive systems and services. In recent years, a number of research studies have investigated the strategic benefits and technical limitations of BBS that is central to the operations of a wide variety of systems ranging from cyber security, healthcare, education, and financial technologies. Despite an increasing interest both from academia and industry in BBS, there is a dearth of empirical evidence resulting in a lack of understanding about processes, methods, and techniques to enable a systematic development of this class of software systems.</div></div><div><h3>Objectives</h3><div>Existing research lacks a consolidated view, particularly empirically-driven guidelines based on published evidence and development practices. Therefore, our objective is to derive new or leverage existing development processes, patterns, and models to design, implement, and validate BBS systems.</div></div><div><h3>Method</h3><div>Tied to this knowledge gap, we conducted a two-phase research that unifies the findings of (i) a systematic literature review and (ii) practitioners’ survey to derive and validate the development process for BBS systems. First, we conducted a systematic literature review of 58 studies to derive a process comprising of 26 activities, to develop BBS systems. We than engaged 102 blockchain practitioners from, 35 countries across 6 continents to validate the BBS system development processes.</div></div><div><h3>Results</h3><div>Our results revealed a statistically significant difference (<em>p</em>-value < .001) in the importance ratings of 24 out of 26 BBS activities by our participants. The only two activities that were not statistically significant were incentive protocol design and granularity design. Our study also presented some of the activities that have been emphasised by our participants within the different development phases (i.e., Analysis Phase, Design Phase, Implementation Phase, Deployment Phase, and Execution and Maintenance Phase).</div></div><div><h3>Conclusion</h3><div>Our research is among the first to advance understanding on the aspect of development process for BBS and helps researchers and practitioners in their quests on challenges and recommendations associated with the development of BBS systems.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107825"},"PeriodicalIF":3.8,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144490826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liang Yu , Emil Alégroth , Panagiota Chatzipetrou , Tony Gorschek
{"title":"Measuring the quality of generative AI systems: Mapping metrics to quality characteristics — Snowballing literature review","authors":"Liang Yu , Emil Alégroth , Panagiota Chatzipetrou , Tony Gorschek","doi":"10.1016/j.infsof.2025.107802","DOIUrl":"10.1016/j.infsof.2025.107802","url":null,"abstract":"<div><h3>Context</h3><div>Generative Artificial Intelligence (GenAI) and the use of Large Language Models (LLMs) have revolutionized tasks that previously required significant human effort, which has attracted considerable interest from industry stakeholders. This growing interest has accelerated the integration of AI models into various industrial applications. However, the model integration introduces challenges to product quality, as conventional quality measuring methods may fail to assess GenAI systems. Consequently, evaluation techniques for GenAI systems need to be adapted and refined. Examining the current state and applicability of evaluation techniques for the GenAI system outputs is essential.</div></div><div><h3>Objective</h3><div>This study aims to explore the current metrics, methods, and processes for assessing the outputs of GenAI systems and the potential of risky outputs.</div></div><div><h3>Method</h3><div>We performed a snowballing literature review to identify metrics, evaluation methods, and evaluation processes from 43 selected papers.</div></div><div><h3>Results</h3><div>We identified 28 metrics and mapped these metrics to four quality characteristics defined by the ISO/IEC 25023 standard for software systems. Additionally, we discovered three types of evaluation methods to measure the quality of system outputs and a three-step process to assess faulty system outputs. Based on these insights, we suggested a five-step framework for measuring system quality while utilizing GenAI models.</div></div><div><h3>Conclusion</h3><div>Our findings present a mapping that visualizes candidate metrics to be selected for measuring quality characteristics of GenAI systems, accompanied by step-by-step processes to assist practitioners in conducting quality assessments.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107802"},"PeriodicalIF":3.8,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144336020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matheus Araújo Aguiar , Elvira Albert , Samir Genaim , Pablo Gordillo , Alejandro Hernández-Cerezo , Daniel Kirchner , Albert Rubio
{"title":"Neural-guided superoptimization in ethereum","authors":"Matheus Araújo Aguiar , Elvira Albert , Samir Genaim , Pablo Gordillo , Alejandro Hernández-Cerezo , Daniel Kirchner , Albert Rubio","doi":"10.1016/j.infsof.2025.107800","DOIUrl":"10.1016/j.infsof.2025.107800","url":null,"abstract":"<div><h3>Context:</h3><div>Superoptimization is a synthesis technique that, given a <em>loop-free sequence</em> of instructions, searches for an equivalent sequence that is <em>optimal wrt.</em> an objective function. Superoptimization of Ethereum smart contracts aims at minimizing the <em>size of their bytecode</em> and the <em>gas consumption</em> of executing the contract’s functions. The search for the optimal solution poses huge computational demands – as the search space to find the optimal sequence is exponential on the given <em>size-bound</em> – being the main challenge for superoptimization today to scale up to real, industrial software. Even if the underlying problem for finding the optimal solution is decidable, practical tools often prioritize efficiency over completeness. This means they might be implemented to find a sub-optimal solution or even time out.</div></div><div><h3>Objective:</h3><div>This work aims at leveraging superoptimization to a real setting: Ethereum blockchain. This paper proposes a <em>neural-guided superoptimization</em> (NGS) approach which incorporates deep neural networks using (supervised) learning into superoptimization to improve scalability by predicting: (1) if a sequence is already optimal and hence the search can be skipped; (2) the size-bound for the optimal solution in order to reduce the search space.</div></div><div><h3>Method:</h3><div>We have downloaded over 13,000 smart contracts deployed on the blockchain for training and testing the machine learning models, and a disjoint set with 100 of the smart contracts with more transactions to prove our scalability gains and impact for the Ethereum community.</div></div><div><h3>Results:</h3><div>Incorporating DNNs resulted in a 16x overall speedup (12x for gas) with only 12% optimization loss (14% for gas), or a 3-4x speedup with no optimization loss. For the 100 analyzed contracts, this approach reduced the average compilation time to 3 min per contract and achieved monetary savings of $1.24M.</div></div><div><h3>Conclusions:</h3><div>The integration of machine learning models mitigates several limitations of traditional superoptimization by drastically reducing execution times while maintaining most of the original optimization gains.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107800"},"PeriodicalIF":3.8,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144339006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging synthetic trace generation of modeling operations for intelligent modeling assistants using large language models","authors":"Vittoriano Muttillo , Claudio Di Sipio , Riccardo Rubei , Luca Berardinelli","doi":"10.1016/j.infsof.2025.107806","DOIUrl":"10.1016/j.infsof.2025.107806","url":null,"abstract":"<div><h3>Context:</h3><div>Due to the proliferation of generative AI models in different software engineering tasks, the research community has started to exploit those models, spanning from requirement specification to code development. Model-Driven Engineering (MDE) is a paradigm that leverages software models as primary artifacts to automate tasks. In this respect, modelers have started to investigate the interplay between traditional MDE practices and Large Language Models (LLMs) to push automation. Although powerful, LLMs exhibit limitations that undermine the quality of generated modeling artifacts, e.g., hallucination or incorrect formatting. Recording modeling operations relies on human-based activities to train modeling assistants, helping modelers in their daily tasks. Nevertheless, those techniques require a huge amount of training data that cannot be available due to several factors, e.g., security or privacy issues.</div></div><div><h3>Objective:</h3><div>In this paper, we propose an extension of a conceptual MDE framework, called MASTER-LLM, that combines different MDE tools and paradigms to support industrial and academic practitioners.</div></div><div><h3>Method:</h3><div>MASTER-LLM comprises a modeling environment that acts as the active context in which a dedicated component records modeling operations. Then, model completion is enabled by the modeling assistant trained on past operations. Different LLMs are used to generate a new dataset of modeling events to speed up recording and data collection.</div></div><div><h3>Results:</h3><div>To evaluate the feasibility of MASTER-LLM in practice, we experiment with two modeling environments, i.e., CAEX and HEPSYCODE, employed in industrial use cases within European projects. We investigate how the examined LLMs can generate realistic modeling operations in different domains.</div></div><div><h3>Conclusion:</h3><div>We show that synthetic traces can be effectively used when the application domain is less complex, while complex scenarios require human-based operations or a mixed approach according to data availability. However, generative AI models must be assessed using proper methodologies to avoid security issues in industrial domains.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107806"},"PeriodicalIF":3.8,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144290857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alejandra Duque-Torres , Claus Klammer , Stefan Fischer , Dietmar Pfahl , Rudolf Ramler
{"title":"Assessing the strength of Metamorphic Testing applied to optimisation software—Experience from industry","authors":"Alejandra Duque-Torres , Claus Klammer , Stefan Fischer , Dietmar Pfahl , Rudolf Ramler","doi":"10.1016/j.infsof.2025.107807","DOIUrl":"10.1016/j.infsof.2025.107807","url":null,"abstract":"<div><h3>Context:</h3><div>The testing of optimisation algorithms (OAs) is difficult due to the test oracle problem. Metamorphic Testing (MT) addresses this challenge. We previously applied MT to a black-box industrial OA and gained first insights into applying MT for OA testing.</div></div><div><h3>Objective:</h3><div>We noticed that some of the identified Metamorphic Relations (MRs) seemed to be more powerful than others. We now define and evaluate an approach to assess and rank MRs in terms of their defect-detection capabilities.</div></div><div><h3>Method:</h3><div>We propose a three-phase approach for assessing the strength of MRs. First, we evaluate the applicability of each MR based on Test Data (TD) using MetaTrimmer, an approach for selecting and constraining MRs. Second, we generate System Under Test (SUT) mutants and classify them into three levels. Level one contains mutants that do not change their behaviour for all TD (equivalent mutants). Level two contains all mutants that change behaviour for every TD (trivial mutants). Level three contains all other mutants. Third, we assess MR effectiveness per level based on the MR’s violation/non-violation ratio.</div></div><div><h3>Results:</h3><div>Among 405 generated SUT mutants analysed, 236 fell into level one, 85 into level two, and 84 into level three. The analysis of the amount of TD triggering an MR violation in each level per mutant revealed that some MRs have higher sensitivity than others.</div></div><div><h3>Conclusion:</h3><div>Our findings show that assessing MRs using our three-level strategy provides clear insights into their defect-detection capabilities. By integrating MetaTrimmer with mutation testing, we identify MRs that are effective in catching faults and sensitive to variations in TD. While evaluated on an OA in an industrial setting, this approach is generalisable to other SUTs. It is actionable for practitioners and researchers seeking to identify robust MRs and offers a structured methodology to evaluate and rank them.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107807"},"PeriodicalIF":3.8,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144297131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manuel Quintela-Pumares, Daniel Fernández-Lanvin, Alberto-Manuel Fernandez-Alvarez
{"title":"Heuristic-based incremental local domain model generation","authors":"Manuel Quintela-Pumares, Daniel Fernández-Lanvin, Alberto-Manuel Fernandez-Alvarez","doi":"10.1016/j.infsof.2025.107817","DOIUrl":"10.1016/j.infsof.2025.107817","url":null,"abstract":"<div><h3>Context</h3><div>Current front-end frameworks and technologies enable rich clients to operate autonomously without frequent server requests. To achieve this autonomy, clients must maintain a Local Domain Model (LDM), often derived from the Global Domain Model (GDM) on the backend. Manually designing an LDM that is consistent with the GDM requires handling nuanced dependencies, an error-prone task where oversights easily occur.</div></div><div><h3>Objective</h3><div>We aim to address these challenges by: (a) formally mapping dependencies between GDM and LDM; (b) analyzing effort and errors when modelling without assistance; and (c) providing a semi-automated method leveraging these dependencies to significantly reduce both effort and errors.</div></div><div><h3>Method</h3><div>To achieve these objectives, we propose a heuristic-based, step-by-step guided method. This approach leverages pre-existing GDM information to incrementally uncover dependencies and automate LDM construction as designers identify local behavior of GDM elements. We assessed this method's impact through an empirical experiment where we aimed to identify common mistakes and quantify effort during LDM construction. Expert UML modelers completed an LDM creation task both manually and with our tool-supported method. We recorded errors and interactive effort to establish a baseline and measure impact. User perceptions were gathered via a survey; an analytical usability study based on GOMS complemented findings.</div></div><div><h3>Results</h3><div>The proportion of users committing errors decreased by 77.8 % with the tool, and the average error count per user was reduced by 97.3 %. Time to complete the task decreased by 35.0 % and interactive effort by 44.6 %, consistent with GOMS predictions. Surveys showed a majority of positive responses across all items.</div></div><div><h3>Conclusions</h3><div>Our approach effectively streamlines Local Domain Model creation. By automatically detecting dependencies and guiding designers, the tool drastically reduces error rates, cuts completion time, and lowers interaction volume. Expert users rated the method positively, affirming that benefits of guided, incremental LDM construction outweigh adoption effort.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107817"},"PeriodicalIF":3.8,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144329797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trust, transparency, and adoption in generative AI for software engineering: Insights from Twitter discourse","authors":"Manaal Basha, Gema Rodríguez-Pérez","doi":"10.1016/j.infsof.2025.107804","DOIUrl":"10.1016/j.infsof.2025.107804","url":null,"abstract":"<div><h3>Context:</h3><div>The rise of AI-driven coding assistants, such as GitHub Copilot and ChatGPT, are transforming software development practices. Despite their growing impact, informal user feedback on these tools is often neglected.</div></div><div><h3>Objective:</h3><div>This study aims to analyze Twitter/X conversations to understand user opinions on the benefits, challenges, and barriers associated with Code Generation Tools (CGTs) in software engineering. By incorporating diverse perspectives from developers, hobbyists, students, and critics, this research provides a comprehensive view of public sentiment.</div></div><div><h3>Methods:</h3><div>We employed a hybrid approach using BERTopic and open coding to collect and analyze data from approximately 90,000 tweets. The focus was on identifying themes and sentiments related to various CGTs. The study sought to determine the most frequently discussed topics and their related sentiment, followed by highlighting the reoccurring feedback or criticisms that could influence generative AI (GenAI) adoption in software engineering.</div></div><div><h3>Results:</h3><div>Our analysis identified several significant themes, including productivity enhancements, shifts in developer practices, regulatory uncertainty, and a demand for neutral GenAI content. While some users praised the efficiency benefits of CGTs, others raised concerns regarding intellectual property, transparency, and potential biases.</div></div><div><h3>Conclusion:</h3><div>The findings highlight that addressing issues of trust, accountability, and legal clarity is essential for the successful integration of CGTs in software development. These insights underscore the need for ongoing dialogue and refinement of CGTs to better align with user expectations and mitigate concerns.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107804"},"PeriodicalIF":3.8,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144297129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kai-Kristian Kemell , Matti Saarikallio , Anh Nguyen-Duc , Pekka Abrahamsson
{"title":"Still just personal assistants? – A multiple case study of generative AI adoption in software organizations","authors":"Kai-Kristian Kemell , Matti Saarikallio , Anh Nguyen-Duc , Pekka Abrahamsson","doi":"10.1016/j.infsof.2025.107805","DOIUrl":"10.1016/j.infsof.2025.107805","url":null,"abstract":"<div><h3>Context:</h3><div>Generative AI (GenAI) is argued to transform software engineering (SE) in various ways, and GenAI tools show promise for various SE tasks. Software organizations across the globe are currently exploring the use of GenAI for SE.</div></div><div><h3>Objective:</h3><div>While numerous studies have recently been published on GenAI, few studies have looked at the adoption of these tools and their usage from an organizational point of view, focusing instead on individual users. Our objective is to understand how organizations adopt these tools and what their impacts are in industrial contexts, with a focus on the European perspective.</div></div><div><h3>Method:</h3><div>We conducted a multiple case study of seven European companies. We collected data through semi-structured interviews (n=15), as well as through longitudinal observation in one case company. All data were analyzed using thematic analysis.</div></div><div><h3>Results:</h3><div>We analyzed 28 transcripts, resulting in 456 quotations and 557 code occurrences split between 66 individual codes that were categorized under 6 high-level themes. We identified 25 types of tasks GenAI was currently being used for in our case organizations. We identified 12 benefits for GenAI in SE and 10 adoption and use challenges. Key adoption challenges for organizations include data privacy and legislative concerns, the emerging and fast-moving market of GenAI tools, difficulty of measuring the positive impact of the tools, and potential change resistance. For individuals, the key challenges are related to prompting, such as understanding what a good prompt is, and how to write prompts for specific tasks.</div></div><div><h3>Conclusion:</h3><div>GenAI adoption is becoming widespread in SE, but good practices and use cases are still emerging. While GenAI can potentially produce various benefits in SE, companies and individual users are facing various challenges in making the most of GenAI in SE. Overall, GenAI is still primarily used as a personal assistant.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107805"},"PeriodicalIF":3.8,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144313726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Does it smell? A homogeneous stacking approach for code smell prediction","authors":"Rim El Jammal, Danielle Azar","doi":"10.1016/j.infsof.2025.107801","DOIUrl":"10.1016/j.infsof.2025.107801","url":null,"abstract":"<div><h3>Context:</h3><div>Code smells, defined as detrimental patterns and design choices in software development, significantly impact various aspects of software quality, such as maintainability, reusability, and stability. These harmful effects can disrupt the software development cycle and result in a waste of development and managerial resources.</div></div><div><h3>Objective:</h3><div>Although code smell detection has attracted considerable attention in recent years, the existing literature still shows certain limitations whereby most of the studies have been conducted on small data sets, a small number of code smells at once and evaluated using few performance metrics.</div></div><div><h3>Methods:</h3><div>In this work, we propose a Homogeneous Stacking Classifier to predict the presence of nine different code smells. We resort to feature selection to keep the attributes relevant to each code smell.</div></div><div><h3>Results:</h3><div>We use a large data set of 19,000 instances and we evaluate the performance of our proposed model using eight different metrics comparing it to state-of-the-art machine learning techniques that have proven to perform well in current research.</div></div><div><h3>Conclusion:</h3><div>The proposed approach statistically significantly outperforms the other models across most cases therefore, affirming its efficacy in code smell detection.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107801"},"PeriodicalIF":3.8,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144280980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SVA-ICL: Improving LLM-based software vulnerability assessment via in-context learning and information fusion","authors":"Chaoyang Gao , Xiang Chen , Guangbei Zhang","doi":"10.1016/j.infsof.2025.107803","DOIUrl":"10.1016/j.infsof.2025.107803","url":null,"abstract":"<div><h3>Context:</h3><div>Software vulnerability assessment (SVA) is critical for identifying, evaluating, and prioritizing security weaknesses in software applications.</div></div><div><h3>Objective:</h3><div>Despite the increasing application of large language models (LLMs) in various software engineering tasks, their effectiveness in SVA remains underexplored.</div></div><div><h3>Method:</h3><div>To address this gap, we introduce a novel approach SVA-ICL, which leverages in-context learning (ICL) to enhance LLM performance. Our approach involves the selection of high-quality demonstrations for ICL through information fusion, incorporating both source code and vulnerability descriptions. For source code, we consider semantic, lexical, and syntactic similarities, while for vulnerability descriptions, we focus on textual similarity. Based on the selected demonstrations, we construct context prompts and consider DeepSeek-V2 as the LLM for SVA-ICL.</div></div><div><h3>Results:</h3><div>We evaluate the effectiveness of SVA-ICL using a large-scale dataset comprising 12,071 C/C++ vulnerabilities. Experimental results demonstrate that SVA-ICL outperforms state-of-the-art SVA baselines in terms of Accuracy, F1-score, and MCC measures. Furthermore, ablation studies highlight the significance of component customization in SVA-ICL, such as the number of demonstrations, the demonstration ordering strategy, and the optimal fusion ratio of different modalities.</div></div><div><h3>Conclusion:</h3><div>Our findings suggest that leveraging ICL with information fusion can effectively improve the effectiveness of LLM-based SVA, warranting further research in this direction.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107803"},"PeriodicalIF":3.8,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144262650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}