{"title":"DiffBCE: Difference contrastive learning for binary code embeddings","authors":"Yun Zhang , Ge Cheng","doi":"10.1016/j.infsof.2025.107822","DOIUrl":"10.1016/j.infsof.2025.107822","url":null,"abstract":"<div><h3>Context:</h3><div>Binary code embedding plays a crucial role in binary similarity detection and software security analysis. However, conventional methods often suffer from scalability issues and depend heavily on large amounts of labeled data, limiting their practical deployment in real-world scenarios.</div></div><div><h3>Objectives:</h3><div>This research introduces DiffBCE, a novel binary code embedding method based on differential contrastive learning. The primary goal is to overcome the limitations of existing approaches by reducing the reliance on labeled data while enhancing the robustness and semantic sensitivity of binary code representations.</div></div><div><h3>Methods:</h3><div>DiffBCE integrates two complementary data augmentation strategies – insensitive transformations (implemented via dropout) and sensitive transformations (using instruction replacement with a Masked Language Model) – within a contrastive learning framework. In addition, a conditional difference prediction module is introduced to capture subtle semantic changes by identifying differences between original and transformed binary code. The model is jointly trained with a combined loss function balancing contrastive loss and conditional difference prediction loss. Experimental validation is performed on multiple binary datasets across various scenarios, including cross-version analysis, cross-optimization-level evaluation, and code obfuscation difference analysis.</div></div><div><h3>Results:</h3><div>Experimental evaluations demonstrate that DiffBCE significantly outperforms state of-the-art methods (e.g., Asm2Vec, DeepBinDiff, PalmTree). Across three similarity detection scenarios, the method achieves improvements in F1 scores by approximately 3.8%, 5.6%, and 11.1%, respectively, underscoring its robustness and effectiveness in handling complex binary code differences.</div></div><div><h3>Conclusions:</h3><div>DiffBCE offers a scalable and efficient solution for binary code embedding by effectively capturing rich semantic features without requiring extensive labeled data. Its superior performance in various testing scenarios suggests promising applications in vulnerability detection, code reuse analysis, reverse engineering, and automated patch generation, paving the way for enhanced software security assessments.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"187 ","pages":"Article 107822"},"PeriodicalIF":3.8,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144633619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pablo Serrano-Gutierrez, Inmaculada Ayala, Lidia Fuentes
{"title":"Integrating energy consumption in the development of serverless applications","authors":"Pablo Serrano-Gutierrez, Inmaculada Ayala, Lidia Fuentes","doi":"10.1016/j.infsof.2025.107819","DOIUrl":"10.1016/j.infsof.2025.107819","url":null,"abstract":"<div><h3>Context:</h3><div>The increasing environmental impact of Information and Communication Technologies (ICTs), particularly the energy consumption associated with serverless applications, necessitates the development of methodologies to optimize energy efficiency. This study addresses the need for energy-aware design and runtime adaptation in serverless architectures.</div></div><div><h3>Objective:</h3><div>To develop and validate a methodology that integrates energy monitoring into the development and runtime management of serverless applications, thereby enabling significant reductions in energy consumption while maintaining functionality.</div></div><div><h3>Methods:</h3><div>A new version of FUSPAQ, a framework for the optimization of serverless applications, was developed. This version incorporates tools like Kepler for real-time energy monitoring and employs an energy-aware orchestration mechanism to dynamically select energy-efficient function configurations. Validation was conducted through a facial recognition case study and benchmark experiments, comparing energy consumption across different scenarios with and without the proposed adaptations.</div></div><div><h3>Results:</h3><div>The enhanced FUSPAQ framework successfully integrated energy consumption metrics into the decision-making process for function selection and runtime adaptation. Benchmark tests confirmed the scalability of the solution, with energy-efficient outcomes even in complex applications.</div></div><div><h3>Conclusion:</h3><div>The study highlights the potential of integrating energy-aware practices in serverless applications, presenting a scalable and practical approach to reducing their environmental footprint. By leveraging tools like Kepler and frameworks like FUSPAQ, developers can achieve significant energy savings without compromising application performance. This work contributes to the advancement of Green Software Engineering by emphasizing runtime energy adaptation in Function-as-a-Service (FaaS) architectures.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107819"},"PeriodicalIF":3.8,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144564048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bakheet Aljedaani , Aakash Ahmad , Mahdi Fehmideh , Arif Ali Khan , Jun Shen
{"title":"An exploration study on developing blockchain systems–the practitioners' perspective","authors":"Bakheet Aljedaani , Aakash Ahmad , Mahdi Fehmideh , Arif Ali Khan , Jun Shen","doi":"10.1016/j.infsof.2025.107825","DOIUrl":"10.1016/j.infsof.2025.107825","url":null,"abstract":"<div><h3>Context</h3><div>Blockchain-based software (BBS) builds upon the foundational technologies of cryptocurrencies like Bitcoin, utilising decentralised, immutable ledgers, to support the development and operation of security-critical and transaction-intensive systems and services. In recent years, a number of research studies have investigated the strategic benefits and technical limitations of BBS that is central to the operations of a wide variety of systems ranging from cyber security, healthcare, education, and financial technologies. Despite an increasing interest both from academia and industry in BBS, there is a dearth of empirical evidence resulting in a lack of understanding about processes, methods, and techniques to enable a systematic development of this class of software systems.</div></div><div><h3>Objectives</h3><div>Existing research lacks a consolidated view, particularly empirically-driven guidelines based on published evidence and development practices. Therefore, our objective is to derive new or leverage existing development processes, patterns, and models to design, implement, and validate BBS systems.</div></div><div><h3>Method</h3><div>Tied to this knowledge gap, we conducted a two-phase research that unifies the findings of (i) a systematic literature review and (ii) practitioners’ survey to derive and validate the development process for BBS systems. First, we conducted a systematic literature review of 58 studies to derive a process comprising of 26 activities, to develop BBS systems. We than engaged 102 blockchain practitioners from, 35 countries across 6 continents to validate the BBS system development processes.</div></div><div><h3>Results</h3><div>Our results revealed a statistically significant difference (<em>p</em>-value < .001) in the importance ratings of 24 out of 26 BBS activities by our participants. The only two activities that were not statistically significant were incentive protocol design and granularity design. Our study also presented some of the activities that have been emphasised by our participants within the different development phases (i.e., Analysis Phase, Design Phase, Implementation Phase, Deployment Phase, and Execution and Maintenance Phase).</div></div><div><h3>Conclusion</h3><div>Our research is among the first to advance understanding on the aspect of development process for BBS and helps researchers and practitioners in their quests on challenges and recommendations associated with the development of BBS systems.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107825"},"PeriodicalIF":3.8,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144490826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liang Yu , Emil Alégroth , Panagiota Chatzipetrou , Tony Gorschek
{"title":"Measuring the quality of generative AI systems: Mapping metrics to quality characteristics — Snowballing literature review","authors":"Liang Yu , Emil Alégroth , Panagiota Chatzipetrou , Tony Gorschek","doi":"10.1016/j.infsof.2025.107802","DOIUrl":"10.1016/j.infsof.2025.107802","url":null,"abstract":"<div><h3>Context</h3><div>Generative Artificial Intelligence (GenAI) and the use of Large Language Models (LLMs) have revolutionized tasks that previously required significant human effort, which has attracted considerable interest from industry stakeholders. This growing interest has accelerated the integration of AI models into various industrial applications. However, the model integration introduces challenges to product quality, as conventional quality measuring methods may fail to assess GenAI systems. Consequently, evaluation techniques for GenAI systems need to be adapted and refined. Examining the current state and applicability of evaluation techniques for the GenAI system outputs is essential.</div></div><div><h3>Objective</h3><div>This study aims to explore the current metrics, methods, and processes for assessing the outputs of GenAI systems and the potential of risky outputs.</div></div><div><h3>Method</h3><div>We performed a snowballing literature review to identify metrics, evaluation methods, and evaluation processes from 43 selected papers.</div></div><div><h3>Results</h3><div>We identified 28 metrics and mapped these metrics to four quality characteristics defined by the ISO/IEC 25023 standard for software systems. Additionally, we discovered three types of evaluation methods to measure the quality of system outputs and a three-step process to assess faulty system outputs. Based on these insights, we suggested a five-step framework for measuring system quality while utilizing GenAI models.</div></div><div><h3>Conclusion</h3><div>Our findings present a mapping that visualizes candidate metrics to be selected for measuring quality characteristics of GenAI systems, accompanied by step-by-step processes to assist practitioners in conducting quality assessments.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107802"},"PeriodicalIF":3.8,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144336020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matheus Araújo Aguiar , Elvira Albert , Samir Genaim , Pablo Gordillo , Alejandro Hernández-Cerezo , Daniel Kirchner , Albert Rubio
{"title":"Neural-guided superoptimization in ethereum","authors":"Matheus Araújo Aguiar , Elvira Albert , Samir Genaim , Pablo Gordillo , Alejandro Hernández-Cerezo , Daniel Kirchner , Albert Rubio","doi":"10.1016/j.infsof.2025.107800","DOIUrl":"10.1016/j.infsof.2025.107800","url":null,"abstract":"<div><h3>Context:</h3><div>Superoptimization is a synthesis technique that, given a <em>loop-free sequence</em> of instructions, searches for an equivalent sequence that is <em>optimal wrt.</em> an objective function. Superoptimization of Ethereum smart contracts aims at minimizing the <em>size of their bytecode</em> and the <em>gas consumption</em> of executing the contract’s functions. The search for the optimal solution poses huge computational demands – as the search space to find the optimal sequence is exponential on the given <em>size-bound</em> – being the main challenge for superoptimization today to scale up to real, industrial software. Even if the underlying problem for finding the optimal solution is decidable, practical tools often prioritize efficiency over completeness. This means they might be implemented to find a sub-optimal solution or even time out.</div></div><div><h3>Objective:</h3><div>This work aims at leveraging superoptimization to a real setting: Ethereum blockchain. This paper proposes a <em>neural-guided superoptimization</em> (NGS) approach which incorporates deep neural networks using (supervised) learning into superoptimization to improve scalability by predicting: (1) if a sequence is already optimal and hence the search can be skipped; (2) the size-bound for the optimal solution in order to reduce the search space.</div></div><div><h3>Method:</h3><div>We have downloaded over 13,000 smart contracts deployed on the blockchain for training and testing the machine learning models, and a disjoint set with 100 of the smart contracts with more transactions to prove our scalability gains and impact for the Ethereum community.</div></div><div><h3>Results:</h3><div>Incorporating DNNs resulted in a 16x overall speedup (12x for gas) with only 12% optimization loss (14% for gas), or a 3-4x speedup with no optimization loss. For the 100 analyzed contracts, this approach reduced the average compilation time to 3 min per contract and achieved monetary savings of $1.24M.</div></div><div><h3>Conclusions:</h3><div>The integration of machine learning models mitigates several limitations of traditional superoptimization by drastically reducing execution times while maintaining most of the original optimization gains.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107800"},"PeriodicalIF":3.8,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144339006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging synthetic trace generation of modeling operations for intelligent modeling assistants using large language models","authors":"Vittoriano Muttillo , Claudio Di Sipio , Riccardo Rubei , Luca Berardinelli","doi":"10.1016/j.infsof.2025.107806","DOIUrl":"10.1016/j.infsof.2025.107806","url":null,"abstract":"<div><h3>Context:</h3><div>Due to the proliferation of generative AI models in different software engineering tasks, the research community has started to exploit those models, spanning from requirement specification to code development. Model-Driven Engineering (MDE) is a paradigm that leverages software models as primary artifacts to automate tasks. In this respect, modelers have started to investigate the interplay between traditional MDE practices and Large Language Models (LLMs) to push automation. Although powerful, LLMs exhibit limitations that undermine the quality of generated modeling artifacts, e.g., hallucination or incorrect formatting. Recording modeling operations relies on human-based activities to train modeling assistants, helping modelers in their daily tasks. Nevertheless, those techniques require a huge amount of training data that cannot be available due to several factors, e.g., security or privacy issues.</div></div><div><h3>Objective:</h3><div>In this paper, we propose an extension of a conceptual MDE framework, called MASTER-LLM, that combines different MDE tools and paradigms to support industrial and academic practitioners.</div></div><div><h3>Method:</h3><div>MASTER-LLM comprises a modeling environment that acts as the active context in which a dedicated component records modeling operations. Then, model completion is enabled by the modeling assistant trained on past operations. Different LLMs are used to generate a new dataset of modeling events to speed up recording and data collection.</div></div><div><h3>Results:</h3><div>To evaluate the feasibility of MASTER-LLM in practice, we experiment with two modeling environments, i.e., CAEX and HEPSYCODE, employed in industrial use cases within European projects. We investigate how the examined LLMs can generate realistic modeling operations in different domains.</div></div><div><h3>Conclusion:</h3><div>We show that synthetic traces can be effectively used when the application domain is less complex, while complex scenarios require human-based operations or a mixed approach according to data availability. However, generative AI models must be assessed using proper methodologies to avoid security issues in industrial domains.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107806"},"PeriodicalIF":3.8,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144290857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alejandra Duque-Torres , Claus Klammer , Stefan Fischer , Dietmar Pfahl , Rudolf Ramler
{"title":"Assessing the strength of Metamorphic Testing applied to optimisation software—Experience from industry","authors":"Alejandra Duque-Torres , Claus Klammer , Stefan Fischer , Dietmar Pfahl , Rudolf Ramler","doi":"10.1016/j.infsof.2025.107807","DOIUrl":"10.1016/j.infsof.2025.107807","url":null,"abstract":"<div><h3>Context:</h3><div>The testing of optimisation algorithms (OAs) is difficult due to the test oracle problem. Metamorphic Testing (MT) addresses this challenge. We previously applied MT to a black-box industrial OA and gained first insights into applying MT for OA testing.</div></div><div><h3>Objective:</h3><div>We noticed that some of the identified Metamorphic Relations (MRs) seemed to be more powerful than others. We now define and evaluate an approach to assess and rank MRs in terms of their defect-detection capabilities.</div></div><div><h3>Method:</h3><div>We propose a three-phase approach for assessing the strength of MRs. First, we evaluate the applicability of each MR based on Test Data (TD) using MetaTrimmer, an approach for selecting and constraining MRs. Second, we generate System Under Test (SUT) mutants and classify them into three levels. Level one contains mutants that do not change their behaviour for all TD (equivalent mutants). Level two contains all mutants that change behaviour for every TD (trivial mutants). Level three contains all other mutants. Third, we assess MR effectiveness per level based on the MR’s violation/non-violation ratio.</div></div><div><h3>Results:</h3><div>Among 405 generated SUT mutants analysed, 236 fell into level one, 85 into level two, and 84 into level three. The analysis of the amount of TD triggering an MR violation in each level per mutant revealed that some MRs have higher sensitivity than others.</div></div><div><h3>Conclusion:</h3><div>Our findings show that assessing MRs using our three-level strategy provides clear insights into their defect-detection capabilities. By integrating MetaTrimmer with mutation testing, we identify MRs that are effective in catching faults and sensitive to variations in TD. While evaluated on an OA in an industrial setting, this approach is generalisable to other SUTs. It is actionable for practitioners and researchers seeking to identify robust MRs and offers a structured methodology to evaluate and rank them.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107807"},"PeriodicalIF":3.8,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144297131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manuel Quintela-Pumares, Daniel Fernández-Lanvin, Alberto-Manuel Fernandez-Alvarez
{"title":"Heuristic-based incremental local domain model generation","authors":"Manuel Quintela-Pumares, Daniel Fernández-Lanvin, Alberto-Manuel Fernandez-Alvarez","doi":"10.1016/j.infsof.2025.107817","DOIUrl":"10.1016/j.infsof.2025.107817","url":null,"abstract":"<div><h3>Context</h3><div>Current front-end frameworks and technologies enable rich clients to operate autonomously without frequent server requests. To achieve this autonomy, clients must maintain a Local Domain Model (LDM), often derived from the Global Domain Model (GDM) on the backend. Manually designing an LDM that is consistent with the GDM requires handling nuanced dependencies, an error-prone task where oversights easily occur.</div></div><div><h3>Objective</h3><div>We aim to address these challenges by: (a) formally mapping dependencies between GDM and LDM; (b) analyzing effort and errors when modelling without assistance; and (c) providing a semi-automated method leveraging these dependencies to significantly reduce both effort and errors.</div></div><div><h3>Method</h3><div>To achieve these objectives, we propose a heuristic-based, step-by-step guided method. This approach leverages pre-existing GDM information to incrementally uncover dependencies and automate LDM construction as designers identify local behavior of GDM elements. We assessed this method's impact through an empirical experiment where we aimed to identify common mistakes and quantify effort during LDM construction. Expert UML modelers completed an LDM creation task both manually and with our tool-supported method. We recorded errors and interactive effort to establish a baseline and measure impact. User perceptions were gathered via a survey; an analytical usability study based on GOMS complemented findings.</div></div><div><h3>Results</h3><div>The proportion of users committing errors decreased by 77.8 % with the tool, and the average error count per user was reduced by 97.3 %. Time to complete the task decreased by 35.0 % and interactive effort by 44.6 %, consistent with GOMS predictions. Surveys showed a majority of positive responses across all items.</div></div><div><h3>Conclusions</h3><div>Our approach effectively streamlines Local Domain Model creation. By automatically detecting dependencies and guiding designers, the tool drastically reduces error rates, cuts completion time, and lowers interaction volume. Expert users rated the method positively, affirming that benefits of guided, incremental LDM construction outweigh adoption effort.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107817"},"PeriodicalIF":3.8,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144329797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trust, transparency, and adoption in generative AI for software engineering: Insights from Twitter discourse","authors":"Manaal Basha, Gema Rodríguez-Pérez","doi":"10.1016/j.infsof.2025.107804","DOIUrl":"10.1016/j.infsof.2025.107804","url":null,"abstract":"<div><h3>Context:</h3><div>The rise of AI-driven coding assistants, such as GitHub Copilot and ChatGPT, are transforming software development practices. Despite their growing impact, informal user feedback on these tools is often neglected.</div></div><div><h3>Objective:</h3><div>This study aims to analyze Twitter/X conversations to understand user opinions on the benefits, challenges, and barriers associated with Code Generation Tools (CGTs) in software engineering. By incorporating diverse perspectives from developers, hobbyists, students, and critics, this research provides a comprehensive view of public sentiment.</div></div><div><h3>Methods:</h3><div>We employed a hybrid approach using BERTopic and open coding to collect and analyze data from approximately 90,000 tweets. The focus was on identifying themes and sentiments related to various CGTs. The study sought to determine the most frequently discussed topics and their related sentiment, followed by highlighting the reoccurring feedback or criticisms that could influence generative AI (GenAI) adoption in software engineering.</div></div><div><h3>Results:</h3><div>Our analysis identified several significant themes, including productivity enhancements, shifts in developer practices, regulatory uncertainty, and a demand for neutral GenAI content. While some users praised the efficiency benefits of CGTs, others raised concerns regarding intellectual property, transparency, and potential biases.</div></div><div><h3>Conclusion:</h3><div>The findings highlight that addressing issues of trust, accountability, and legal clarity is essential for the successful integration of CGTs in software development. These insights underscore the need for ongoing dialogue and refinement of CGTs to better align with user expectations and mitigate concerns.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107804"},"PeriodicalIF":3.8,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144297129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kai-Kristian Kemell , Matti Saarikallio , Anh Nguyen-Duc , Pekka Abrahamsson
{"title":"Still just personal assistants? – A multiple case study of generative AI adoption in software organizations","authors":"Kai-Kristian Kemell , Matti Saarikallio , Anh Nguyen-Duc , Pekka Abrahamsson","doi":"10.1016/j.infsof.2025.107805","DOIUrl":"10.1016/j.infsof.2025.107805","url":null,"abstract":"<div><h3>Context:</h3><div>Generative AI (GenAI) is argued to transform software engineering (SE) in various ways, and GenAI tools show promise for various SE tasks. Software organizations across the globe are currently exploring the use of GenAI for SE.</div></div><div><h3>Objective:</h3><div>While numerous studies have recently been published on GenAI, few studies have looked at the adoption of these tools and their usage from an organizational point of view, focusing instead on individual users. Our objective is to understand how organizations adopt these tools and what their impacts are in industrial contexts, with a focus on the European perspective.</div></div><div><h3>Method:</h3><div>We conducted a multiple case study of seven European companies. We collected data through semi-structured interviews (n=15), as well as through longitudinal observation in one case company. All data were analyzed using thematic analysis.</div></div><div><h3>Results:</h3><div>We analyzed 28 transcripts, resulting in 456 quotations and 557 code occurrences split between 66 individual codes that were categorized under 6 high-level themes. We identified 25 types of tasks GenAI was currently being used for in our case organizations. We identified 12 benefits for GenAI in SE and 10 adoption and use challenges. Key adoption challenges for organizations include data privacy and legislative concerns, the emerging and fast-moving market of GenAI tools, difficulty of measuring the positive impact of the tools, and potential change resistance. For individuals, the key challenges are related to prompting, such as understanding what a good prompt is, and how to write prompts for specific tasks.</div></div><div><h3>Conclusion:</h3><div>GenAI adoption is becoming widespread in SE, but good practices and use cases are still emerging. While GenAI can potentially produce various benefits in SE, companies and individual users are facing various challenges in making the most of GenAI in SE. Overall, GenAI is still primarily used as a personal assistant.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107805"},"PeriodicalIF":3.8,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144313726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}