Jose Ignacio Trasobares , África Domingo , Rodrigo Casamayor , Daniel Blasco , Carlos Cetina
{"title":"A multiple case study on reuse in Game Software Engineering","authors":"Jose Ignacio Trasobares , África Domingo , Rodrigo Casamayor , Daniel Blasco , Carlos Cetina","doi":"10.1016/j.infsof.2025.107781","DOIUrl":"10.1016/j.infsof.2025.107781","url":null,"abstract":"<div><h3>Context:</h3><div>Game Software Engineering (GSE) is a specialized field at the intersection of software engineering and video game development. Reuse in GSE is particularly complex due to the iterative nature of game development and technical needs that arise in creating interactive digital experiences.</div></div><div><h3>Objective:</h3><div>This paper presents the first multi-case study on reuse in GSE, focusing on how reusable components are developed and maintained in game projects. The study aims to investigate reuse practices by analyzing multiple sources, including access to game projects, interviews with developers, focus groups, studio visits, and code analysis.</div></div><div><h3>Method:</h3><div>The study integrates various evidence sources to gain a comprehensive view of reuse in GSE. Data were gathered from interviews and focus groups, supplemented by direct observations during visits. Additionally, a recent proposal on software phylogenetics was applied to analyze source code, providing insights into reuse in game projects.</div></div><div><h3>Results:</h3><div>Our findings highlight the significance of prefabs in promoting reuse, especially in managing complex game objects. Prefabs emerged as a widely used element, confirmed by developer feedback and repository analysis. Software phylogenetics also revealed certain drawbacks.</div></div><div><h3>Conclusion:</h3><div>While prefabs play a relevant role enhance reusability, they can introduce redundancy, bugs, and unused components (dead prefabs). Understanding these limitations could inspire future research addressing such issues. Prefab-related practices in GSE could benefit other software engineering areas, encouraging broader reuse strategies.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"185 ","pages":"Article 107781"},"PeriodicalIF":3.8,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144116721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-source cross-domain vulnerability detection based on code pre-trained model","authors":"Yang Cao, Yunwei Dong","doi":"10.1016/j.infsof.2025.107764","DOIUrl":"10.1016/j.infsof.2025.107764","url":null,"abstract":"<div><h3>Context:</h3><div>In recent years, deep learning-based vulnerability detection methods have achieved significant success. These methods predict vulnerabilities by automatically learning patterns from code annotated with vulnerability information. However, labeled data is usually concentrated in a few software projects and programming languages. In practice, due to distribution discrepancy in vulnerabilities across different software projects or programming languages, vulnerability detection models trained on limited projects or a specific language often struggle to generalize to new projects or languages. Currently, cross-domain vulnerability detection methods utilize domain adaptation to reduce the distribution discrepancy between the labeled source domain and the target domain being tested. However, the language models used in existing methods limit the expressive power of feature vectors, and they only employ single-source domain adaptation methods.</div></div><div><h3>Objective:</h3><div>To address the limitations of current cross-domain vulnerability detection methods, we propose a new method for <u>M</u>ulti-<u>S</u>ource cross-domain <u>V</u>ulnerability <u>D</u>etection (<em>MSVD</em>).</div></div><div><h3>Method:</h3><div>MSVD combines two knowledge transfer methods, fine-tuning and domain adaptation. The fine-tuned code pre-trained model extracts code features, generating more meaningful code vector representations. The adversarial-based multi-source domain adaptation method aligns features between multiple source domains and the target domain, leveraging richer knowledge from multiple source domains.</div></div><div><h3>Results:</h3><div>We conducted experiments on real datasets comprising various languages and projects to evaluate the effectiveness of MSVD. Experiment results show that, compared to the baselines in the target domain, MSVD improves F1-score, accuracy, and AUC in the cross-language scenario by 2.95%<span><math><mo>∼</mo></math></span>112.90%, 4.37%<span><math><mo>∼</mo></math></span>27.65%, and 4.19%<span><math><mo>∼</mo></math></span>57.83%, respectively. Additionally, in the cross-project scenario, MSVD achieves the highest F1-score and shows superior performance in terms of accuracy and AUC.</div></div><div><h3>Conclusion:</h3><div>These results indicate that compared to the current state-of-the-art methods, MSVD significantly improves vulnerability detection performance in two cross-domain settings: cross-language and cross-project, when the target domain is unlabeled.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"185 ","pages":"Article 107764"},"PeriodicalIF":3.8,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144123312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the understandability of coupling-related practices in infrastructure-as-code based deployments","authors":"Pierre-Jean Quéval , Nicole Elisabeth Hörner , Evangelos Ntentos , Uwe Zdun","doi":"10.1016/j.infsof.2025.107761","DOIUrl":"10.1016/j.infsof.2025.107761","url":null,"abstract":"<div><div>Infrastructure as Code (IaC) empowers software developers and operations teams to automate the deployment and management of IT infrastructure through code. This is particularly valuable for continuously released deployments such as microservices and cloud-based systems. IaC technologies offer flexibility in provisioning and deploying application architectures. However, if the structure is not well-designed, it can lead to severe issues related to coupling aspects. Unfortunately, a lack of comprehensive coupling guidelines for IaC makes ensuring adherence to best practices challenging. Leveraging IaC-based models, metrics, and source code can enhance the comprehension and implementation of coupling measures.</div><div>Our objective was to investigate how developers understand information derived from system source code and compare it to formal IaC system diagrams and metrics. We conducted a controlled experiment involving a group of participants to evaluate the understandability of IaC system architecture descriptions through source code inspection and formal representations.</div><div>We hypothesized that providing formal IaC system diagrams and metrics as supplementary materials would improve the understanding of IaC coupling-related practices measured by task <em>correctness</em>. We also expected that these supplementary resources would lead to a significant increase in task <em>duration</em> and that there would be a notable correlation between <em>correctness</em> and <em>duration</em>.</div><div>The results suggest that including formal IaC system diagrams and metrics as supplementary materials significantly enhances the comprehension of IaC coupling-related practices, as indicated by task <em>correctness</em>. Moreover, providing these formal representations does not significantly prolong task <em>duration</em>, indicating that they do not hinder understanding. A substantial correlation between task <em>correctness</em> and <em>duration</em> is evident when formal IaC system diagrams and metrics are available.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"185 ","pages":"Article 107761"},"PeriodicalIF":3.8,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144139158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Horácio L. França , Katerina Goseva-Popstojanova , César Teixeira , Nuno Laranjeiro
{"title":"GPTs are not the silver bullet: Performance and challenges of using GPTs for security bug report identification","authors":"Horácio L. França , Katerina Goseva-Popstojanova , César Teixeira , Nuno Laranjeiro","doi":"10.1016/j.infsof.2025.107778","DOIUrl":"10.1016/j.infsof.2025.107778","url":null,"abstract":"<div><h3>Context:</h3><div>Identifying security bugs in software is critical to minimize vulnerability windows. Traditionally, bug reports are submitted through issue trackers and manually analyzed, which is time-consuming. Challenges such as data scarcity and imbalance generally hinder the development of effective machine learning models that could be used to automate this task. Generative Pre-trained Transformer (GPT) models do not require training and are less affected by the imbalance problem. Therefore, they have gained popularity for various text-based classification tasks, apparently becoming a natural highly promising solution for this problem.</div></div><div><h3>Objective:</h3><div>This paper explores the potential of using GPT models to identify security bug reports from the perspective of a user of this type of models. We aim to assess their classification performance in this task compared to traditional machine learning (ML) methods, while also investigating how different factors, such as the prompt used and datasets’ characteristics, affect their results.</div></div><div><h3>Methods:</h3><div>We evaluate the performance of four state-of-the-art GPT models (i.e., GPT4All-Falcon, Wizard, Instruct, OpenOrca) on the task of security bug report identification. We use three different prompts for each GPT model and compare the results with traditional ML models. The empirical results are based on using bug report data from seven projects (i.e., Ambari, Camel, Derby, Wicket, Nova, OpenStack, and Ubuntu).</div></div><div><h3>Results:</h3><div>GPT models show noticeable difficulties in identifying security bug reports, with performance levels generally lower than traditional ML models. The effectiveness of the GPT models is quite variable, depending on the specific model and prompt used, as well as the particular dataset.</div></div><div><h3>Conclusion:</h3><div>Although GPT models are nowadays used in many types of tasks, including classification, their current performance in security bug report identification is surprisingly insufficient and inferior to traditional ML models. Further research is needed to address the challenges identified in this paper in order to effectively apply GPT models to this particular domain.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"185 ","pages":"Article 107778"},"PeriodicalIF":3.8,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144116798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuo Liu , Jacky Keung , Zhen Yang , Fang Liu , Fengji Zhang , Yicheng Sun
{"title":"Exploring continual learning in code intelligence with domain-wise distilled prompts","authors":"Shuo Liu , Jacky Keung , Zhen Yang , Fang Liu , Fengji Zhang , Yicheng Sun","doi":"10.1016/j.infsof.2025.107775","DOIUrl":"10.1016/j.infsof.2025.107775","url":null,"abstract":"<div><h3>Context:</h3><div>Software programs evolve constantly in practice, leading to domain shifts that cannot be fitted in the traditional offline manner. Recently, a few Continual Learning (CL) studies on code intelligence emerged, which learn a sequence of datasets one by one. We criticize existing rehearsal-based CL methods heavily rely on retraining historical samples, bringing about an extra training burden and the risk of data disclosure.</div></div><div><h3>Objective:</h3><div>To overcome the above limitations, in this paper, we leverage the superiority of prompts in eliciting pre-trained knowledge to realize a rehearsal-free method.</div></div><div><h3>Methods:</h3><div>We first explore the performance of vanilla prompt tuning in the CL scenario, finding that inheriting the previous Pre-trained Language Model (PLM) parameters is appropriate and prompt stability should be emphasized. Therefore, we propose an effective method named Prompt Tuning with Domain-wise Distillation (PTDD), which can distill prompts and optimize PLMs with a two-sided learning objective, thus improving PLMs’ performance in diverse domains.</div></div><div><h3>Results:</h3><div>We conduct experiments on three widely-studied code intelligence tasks, including Code Summarization, Code Vulnerability Detection, and Code Clone Detection. We evaluate PTDD in comparison with a series of baselines. Experimental results indicate the effectiveness of PTDD. For instance, PTDD surpasses fine-tuning by 2.55%, 11.12%, and 2.25% in the three tasks, respectively. Moreover, we interpret the effectiveness of PTDD by prompt visualization, and discuss its performance in the low-resource scenario, where the improvement of PTDD becomes stark with fewer training samples and can reach up to 69.09%.</div></div><div><h3>Conclusion:</h3><div>To the best of our knowledge, our work conducts the first experimental study to explore the performance of prompt tuning within the CL setting in the code intelligence field. The research findings indicate the effectiveness of PTDD and contribute to a deeper understanding of the capability of prompts.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"185 ","pages":"Article 107775"},"PeriodicalIF":3.8,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144098340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Zhang, Yiwen Wu, Tao Wang, Bo Ding, Huaimin Wang
{"title":"What problems are MLOps practitioners talking about? A study of discussions in Stack Overflow forum and GitHub projects","authors":"Yang Zhang, Yiwen Wu, Tao Wang, Bo Ding, Huaimin Wang","doi":"10.1016/j.infsof.2025.107768","DOIUrl":"10.1016/j.infsof.2025.107768","url":null,"abstract":"<div><h3>Context:</h3><div>Machine Learning Operations (MLOps) has emerged as a crucial technology for addressing the challenges of designing and maintaining productive ML applications. The widespread adoption of MLOps makes it essential to identify the problems faced by MLOps practitioners. However, there has been relatively little research in this area.</div></div><div><h3>Objectives:</h3><div>To fill this research gap and gain an understanding of the interests and difficulties encountered by MLOps practitioners.</div></div><div><h3>Methods:</h3><div>We mine discussion data from the online Q&A forum, Stack Overflow, and GitHub projects, and analyze 6345 posts and 2103 issues.</div></div><div><h3>Results:</h3><div>We construct the first taxonomy of MLOps problems in practice, consisting of 5 categories and 19 topics. We also investigate the evolution and characteristics (difficulty and sentiment) of these topics, distill 12 frequent solutions for different MLOps problems, and design an MLOps knowledge exploration tool, MLOps-KET.</div></div><div><h3>Conclusion:</h3><div>We find that practitioners face diverse challenges when performing MLOps practices and that the focus of their discussions changed over time. Our study contributes to the MLOps research and development community by providing implications for different audiences and guidance for future support of relevant techniques and tools.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"185 ","pages":"Article 107768"},"PeriodicalIF":3.8,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144089615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Systematic Review of Software Product Value: Perspectives Beyond Functionality","authors":"C.R. Oruthotaarachchi, W.M.J.I. Wijayanayake","doi":"10.1016/j.infsof.2025.107784","DOIUrl":"10.1016/j.infsof.2025.107784","url":null,"abstract":"<div><h3>Context</h3><div>Developing software products that effectively address customer needs while offering high business value is essential. Traditional software value assessments focus on technical performance, cost-effectiveness, and business impact, prioritizing security and reliability. Current definitions have emerged primarily from technical and economic perspectives, eliminating people-oriented perspectives from their discussions. A gap remains for discussion about how people-oriented domains such as management and marketing would interfere with software product value.</div></div><div><h3>Objective</h3><div>This paper presents a systematic literature review that investigates different perspectives of software product value, combining insights from management, marketing, design, and software engineering domains to provide a holistic view of software product value.</div></div><div><h3>Method</h3><div>The study was conducted based on an established systematic review methodology searching for articles published from 2004 to 2024 in five academic databases. A qualitative data analysis approach was used to answer the research questions, and a PRISMA statement was followed to ensure the rigorous reporting of this research.</div></div><div><h3>Results</h3><div>The search process filtered 67 articles, providing valuable insights into the existing discussions of software product value. The findings emphasize that, in addition to functional and non-functional requirements, software product managers must prioritize psychological and social requirements, provide seamless customer relationship management, and connect the software product with both the software and client organizations’ strategic ambitions.</div></div><div><h3>Conclusion</h3><div>The value of software products is not limited to their performance but also the perception of benefits, emotions and brand identity. Integrating software development with exact customer objectives, organizational goals, and market demands significantly maximizes perceived software value. This integrated strategy is critical for increasing value throughout the product's lifecycle and ensuring product market sustainability.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"185 ","pages":"Article 107784"},"PeriodicalIF":3.8,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144089670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A systematic literature review on transformation for testability techniques in software systems","authors":"Fateme Bagheri-Galle , Saeed Parsa , Morteza Zakeri","doi":"10.1016/j.infsof.2025.107788","DOIUrl":"10.1016/j.infsof.2025.107788","url":null,"abstract":"<div><h3>Context</h3><div>Software testability is a critical aspect of software development, enabling efficient error identification during testing. Program transformation techniques, mainly refactoring, play a key role in enhancing testability by simplifying the process of identifying and addressing potential issues. By improving testability, developers empower themselves to create more dependable software products.</div></div><div><h3>Objective</h3><div>Our study aims to conduct a systematic literature review focused on transformation techniques for improving testability in software systems. By analyzing existing research, we seek to provide insights into effective strategies for enhancing testability and addressing critical issues in software development.</div></div><div><h3>Method</h3><div>We queried six digital libraries, resulting in over 5000 articles. After rigorous analysis, we narrowed our focus to 39 primary research papers. Based on a novel hierarchical classification of the approaches used to enhance testability, the selected articles were analyzed considering the refactoring techniques, software metrics, and code smells affecting testability at the design and code levels.</div></div><div><h3>Results</h3><div>Our investigation revealed that among our findings, 53.8 % of the papers specifically employed refactoring for testability, while 46.2 % utilized testability transformation techniques. Only one study provided structured sequences of refactoring for testability. The studies primarily focused on three testing levels: unit testing, regression testing, and graphical user interface (GUI) testing. Notably, unit testing received the most attention, appearing in 71.8 % of the studies. About 64.1 % of the studies involved software projects written in the Java programming language. The results suggest that removing code smells and anti-patterns through refactoring would increase testability.</div></div><div><h3>Conclusion</h3><div>While transformation techniques are essential to increase testability and often improve it, more research is needed to address this critical issue. Additionally, exploring other levels of testing beyond unit testing and using software projects with languages beyond Java is essential. To enhance testability, it is necessary to provide more refactoring sequences aimed at improving testability.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"185 ","pages":"Article 107788"},"PeriodicalIF":3.8,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144116720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhenzhou Tian , Yudong Teng , Xianqun Ke , Yanping Chen , Lingwei Chen
{"title":"SolBERT: Advancing solidity smart contract similarity analysis via self-supervised pre-training and contrastive fine-tuning","authors":"Zhenzhou Tian , Yudong Teng , Xianqun Ke , Yanping Chen , Lingwei Chen","doi":"10.1016/j.infsof.2025.107766","DOIUrl":"10.1016/j.infsof.2025.107766","url":null,"abstract":"<div><h3>Context:</h3><div>Reliable and effective similarity analysis for the smart contracts facilitates the maintenance and quality assurance of the smart contract ecosystem. However, existing signature-based methods and code representation learning-based methods suffer from limitations such as heavy-weight program analysis payloads or suboptimal contract encodings.</div></div><div><h3>Objective:</h3><div>This paper aims to design a fully unsupervised language model for better capturing the syntactic and semantic richness of Solidity code, and utilizes it for advancing the effectiveness of smart contract similarity analysis.</div></div><div><h3>Methods:</h3><div>Inspired by the impressive semantic learning capability of pre-trained language models (PLMs), we propose SolBERT, a PLM specifically tailored for enhancing Solidity smart contracts similarity detection. To ensure it produces high-quality encodings, SolBERT leverages BERT-style pre-training with the masked language modeling (MLM) and token type prediction (TTP) tasks applied on code-structure-aware token sequences derived from the contracts’ abstract syntax trees (ASTs) through structure-retaining tree linearization and light-weight normalization to learn a base model. On this basis, self-supervised contrastive fine-tuning and unsupervised whitening operations are further performed to optimize contract encoding generation.</div></div><div><h3>Results:</h3><div>Experiments are conducted on three contract similarity-related tasks, including contract clone detection, bug detection, and code clustering. The results indicate that SolBERT significantly outperforms state-of-the-art approaches with average absolute gains of 21.33% and 21.50% in terms of F1, and 17.78% and 26.60% in terms of accuracy for the clone detection and bug detection tasks, respectively; and an average absolute gain of 17.97% for code clustering task. When applying both contrastive fine-tuning and whitening optimizations, SolBERT also shows superior performance than the case of lacking any of them.</div></div><div><h3>Conclusion:</h3><div>The proposed approach, SolBERT, can serve as a reliable and powerful smart contract encoder, better capturing the syntactic and semantic aspects of the Solidity code. The results and findings also validate the effectiveness and positive synergistic effect of SolBERT’s encoding optimization operations.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"184 ","pages":"Article 107766"},"PeriodicalIF":3.8,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143946831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ahmed El Cheikh Ammar, Sukru Eraslan, Yeliz Yesilada
{"title":"Predicting the truck factor in a software repository using machine learning","authors":"Ahmed El Cheikh Ammar, Sukru Eraslan, Yeliz Yesilada","doi":"10.1016/j.infsof.2025.107765","DOIUrl":"10.1016/j.infsof.2025.107765","url":null,"abstract":"<div><h3>Context:</h3><div>The Truck or Bus factor is a metric that evaluates which developers would cause the development process in a software project to decelerate should they get removed (or hit by a truck/bus). Measuring the truck factor in software development is complex due to the many variables involved. Several algorithms have been developed to address this. However, they suffer from the fact that they tend to tunnel vision on code-centric metrics such as commits made by a developer. While such a feature is important in assessing the contribution of a developer, it does not tell the whole story behind a contribution.</div></div><div><h3>Objective:</h3><div>This paper aims to consider a comprehensive set of version control system (VCS) features, including those that have not yet been investigated in the literature, with Machine Learning (ML) to predict Truck Factor.</div></div><div><h3>Method:</h3><div>We examine what features existing algorithms utilize and then design a feature set that addresses various coding-based metrics, collaborative behaviors, developer activity patterns, and the broader technological context of a project. Afterwards, multiple supervised ML models with different algorithms, such as Random Forest, Naive Bayes, etc., are designed to utilize this feature set to predict the key contributors in GitHub repositories, ultimately computing the truck factor, and then these ML models are compared with the literature.</div></div><div><h3>Results:</h3><div>Random Forest with hypertuned parameters and an aggregated model of hypertuned Random Forest and Naive Bayes with priors achieve the best performance, with mean F1-Scores of 84.1% and 86.4%, respectively. These models outperform existing algorithms except one of them, which lagged slightly behind in terms of precision.</div></div><div><h3>Conclusion:</h3><div>Our research addresses the limitations of existing work by investigating a wider range of VCS features and developing a supervised ML model to predict the truck factor, which demonstrates robust identification of true Truck Factor members.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"184 ","pages":"Article 107765"},"PeriodicalIF":3.8,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143936104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}