{"title":"Predicting test failures induced by software defects: A lightweight alternative to software defect prediction and its industrial application","authors":"Lech Madeyski , Szymon Stradowski","doi":"10.1016/j.jss.2025.112360","DOIUrl":"10.1016/j.jss.2025.112360","url":null,"abstract":"<div><h3>Context:</h3><div>Machine Learning Software Defect Prediction (ML SDP) is a promising method to improve the quality and minimise the cost of software development.</div></div><div><h3>Objective:</h3><div>We aim to: (1) apropose and develop a Lightweight Alternative to SDP (LA2SDP) that predicts test failures induced by software defects to allow pinpointing defective software modules thanks to available mapping of predicted test failures to past defects and corrected modules, (2) preliminary evaluate the proposed method in a real-world Nokia 5G scenario.</div></div><div><h3>Method:</h3><div>We train machine learning models using test failures that come from confirmed software defects already available in the Nokia 5G environment. We implement LA2SDP using five supervised ML algorithms, together with their tuned versions, and use eXplainable AI (XAI) to provide feedback to stakeholders and initiate quality improvement actions.</div></div><div><h3>Results:</h3><div>We have shown that LA2SDP is feasible in vivo using test failure-to-defect report mapping readily available within the Nokia 5G system-level test process, achieving good predictive performance. Specifically, CatBoost Gradient Boosting turned out to perform the best and achieved satisfactory Matthew’s Correlation Coefficient (MCC) results for our feasibility study.</div></div><div><h3>Conclusions:</h3><div>Our efforts have successfully defined, developed, and validated LA2SDP, using the sliding and expanding window approaches on an industrial data set.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"223 ","pages":"Article 112360"},"PeriodicalIF":3.7,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143215068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anis R. Amna , Yves Wautelet , Stephan Poelmans , Samedi Heng , Geert Poels
{"title":"The AmbiTRUS framework for identifying potential ambiguity in user stories","authors":"Anis R. Amna , Yves Wautelet , Stephan Poelmans , Samedi Heng , Geert Poels","doi":"10.1016/j.jss.2025.112357","DOIUrl":"10.1016/j.jss.2025.112357","url":null,"abstract":"<div><div>Ambiguity in natural language-based requirements is a well-known issue, often addressed as a singular problem despite its complexity. Studies reveal that ambiguity in user stories can manifest differently depending on the linguistic levels.</div><div>This study introduces the ambiguity analysis framework (AmbiTRUS) to address these diverse manifestations by composing quality criteria for 13 types of ambiguity problems, classified across four linguistic levels and linked to four types of requirements quality problems. The proposed quality criteria were selected and adapted from three established user story quality frameworks: the QUS framework, the Agile Requirements Verification framework, and the INVEST framework.</div><div>To assess the potential effectiveness of AmbiTRUS, a controlled laboratory experiment with advanced MSc students representing novice practitioners of the intended users of the framework. While the experiment did not demonstrate clear effectiveness, users found the framework useful despite its complexity.</div><div>Insights from the experiment allowed redefining the framework's quality criteria. The main lesson learned from the experiment is the need for tool support in applying AmbiTRUS, particularly using NLP techniques to verify the quality criteria. The development of such an NLP-based tool and the evaluation of AmbiTRUS through a usability study of the tool are the next steps in our research.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"223 ","pages":"Article 112357"},"PeriodicalIF":3.7,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143215070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Benavides , Chico Sundermann , Kevin Feichtinger , José A. Galindo , Rick Rabiser , Thomas Thüm
{"title":"UVL: Feature modelling with the Universal Variability Language","authors":"David Benavides , Chico Sundermann , Kevin Feichtinger , José A. Galindo , Rick Rabiser , Thomas Thüm","doi":"10.1016/j.jss.2024.112326","DOIUrl":"10.1016/j.jss.2024.112326","url":null,"abstract":"<div><div>Feature modelling is a cornerstone of software product line engineering, providing a means to represent software variability through features and their relationships. Since its inception in 1990, feature modelling has evolved through various extensions, and after three decades of development, there is a growing consensus on the need for a standardised feature modelling language. Despite multiple endeavours to standardise variability modelling and the creation of various textual languages, researchers and practitioners continue to use their own approaches, impeding effective model sharing. In 2018, a collaborative initiative was launched by a group of researchers to develop a novel textual language for representing feature models. This paper introduces the outcome of this effort: the Universal Variability Language (<span>UVL</span>), which is designed to be human-readable and serves as a pivot language for diverse software engineering tools. The development of <span>UVL</span> drew upon community feedback and leveraged established literature in the field of variability modelling. The language is structured into three levels – Boolean, Arithmetic, and Type – and allows for language extensions to introduce additional constructs enhancing its expressiveness. <span>UVL</span> is integrated into various existing software tools, such as FeatureIDE and flamapy, and is maintained by a consortium of institutions. All tools that support the language are released in an open-source format, complemented by dedicated parser implementations for Python and Java. Beyond academia, <span>UVL</span> has found adoption within a range of institutions and companies. It is envisaged that <span>UVL</span> will become the language of choice in the future for a multitude of purposes, including knowledge sharing, educational instruction, and tool integration and interoperability. We envision <span>UVL</span> as a pivotal solution, addressing the limitations of prior attempts and fostering collaboration and innovation in the domain of software product line engineering.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"225 ","pages":"Article 112326"},"PeriodicalIF":3.7,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143479175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An empirical analysis of feature fusion task heads of ViT pre-trained models on OOD classification tasks","authors":"Mingxing Zhang, Jun Ai, Tao Shi","doi":"10.1016/j.jss.2025.112358","DOIUrl":"10.1016/j.jss.2025.112358","url":null,"abstract":"<div><div>ViT pre-training model has been widely used in various downstream tasks, and the structure of task head has a significant impact on downstream tasks. While it is a common practice to empirically concatenate the last few layers’ cls token of the ViT model for classification, there exists limited research on whether the feature fusion structure holds significance for the model. This paper primarily discusses the impact of attention-mechanism-based fusion structure on the backbone network and classification performance. Initially, we examine the relationship between dataset and feature fusion task head, followed by an exploration of how different locations of fusion middle layer affect model performance as well as how feature fusion task head influences the backbone network itself. Finally, we characterize the task head through the loss of models based on feature fusion structure. Based on empirical findings, we identify 5 important insights and provide recommendations for the model structures during downstream task fine-tuning.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"223 ","pages":"Article 112358"},"PeriodicalIF":3.7,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143349124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CG-FL: A data augmentation approach using context-aware genetic algorithm for fault localization","authors":"Jian Hu","doi":"10.1016/j.jss.2025.112359","DOIUrl":"10.1016/j.jss.2025.112359","url":null,"abstract":"<div><div>Fault localization (FL) is a critical step in software debugging. Coverage-based fault localization (CFL) as one of the most promising FL technique utilizes coverage information obtained from program entities executed by test cases to determine the entities that are more likely to be faulty. However, CFL faces two main issues that limit its effectiveness. Firstly, the code coverage data contains numerous irrelevant statements for the observed failure, which makes the search scope too large for FL. Secondly, the input coverage data is highly imbalanced due to the presence of significantly more passing test cases than failing test cases, which makes the FL model bias to the passing test cases. To address these problems, we propose CG-FL, a data augmentation approach using context-aware genetic algorithm. Specifically, CG-FL first uses program slicing to construct a failure context for FL. Subsequently, CG-FL generate synthesized failing test cases through the application of the genetic algorithm. To evaluate the effectiveness of CG-FL, we compared it with six state-of-the-art FL methods and three representative data augmentation methods on 420 versions of 9 benchmarks. The experimental findings clearly indicate that CG-FL substantially enhances the effectiveness of the six FL methods and outperforms the three data augmentation methods.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112359"},"PeriodicalIF":3.7,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143104116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bangchao Wang , Yang Deng , Ruiqi Luo , Peng Liang , Tingting Bi
{"title":"MPLinker: Multi-template Prompt-tuning with adversarial training for Issue–commit Link recovery","authors":"Bangchao Wang , Yang Deng , Ruiqi Luo , Peng Liang , Tingting Bi","doi":"10.1016/j.jss.2025.112351","DOIUrl":"10.1016/j.jss.2025.112351","url":null,"abstract":"<div><div>In recent years, the pre-training, prompting and prediction paradigm, known as prompt-tuning, has achieved significant success in Natural Language Processing (NLP). Issue–commit Link Recovery (ILR) in Software Traceability (ST) plays an important role in improving the reliability, quality, and security of software systems. The current ILR methods convert the ILR into a classification task using pre-trained language models (PLMs) and dedicated neural networks. These methods do not fully utilize the semantic information embedded in PLMs, failing to achieve acceptable performance. To address this limitation, we introduce a novel paradigm: <strong>Multi-template Prompt-tuning</strong> with adversarial training for issue–commit <strong>Link</strong> recovery (MPLinker). MPLinker redefines the ILR task as a cloze task via template-based prompt-tuning and incorporates adversarial training to enhance model generalization and reduce overfitting. We evaluated MPLinker on six open-source projects using a comprehensive set of performance metrics. The experiment results demonstrate that MPLinker achieves an average F1-score of 96.10%, Precision of 96.49%, Recall of 95.92%, MCC of 94.04%, AUC of 96.05%, and ACC of 98.15%, significantly outperforming existing state-of-the-art methods. Overall, MPLinker improves the performance and generalization of ILR models and introduces innovative concepts and methods for ILR. The replication package for MPLinker is available at <span><span>https://github.com/WTU-intelligent-software-development/MPLinker</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"223 ","pages":"Article 112351"},"PeriodicalIF":3.7,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143350866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Different approaches for testing body sensor network applications","authors":"Samira Silva , Ricardo Caldas , Patrizio Pelliccione , Antonia Bertolino","doi":"10.1016/j.jss.2025.112336","DOIUrl":"10.1016/j.jss.2025.112336","url":null,"abstract":"<div><div>Body Sensor Networks (BSNs) offer a cost-effective way to monitor patients’ health and detect potential risks. Despite the growing interest attracted by BSNs, there is a lack of testing approaches for them. Testing a Body Sensor Network (BSN) is challenging due to its evolving nature, the complexity of sensor scenarios and their fusion, the potential necessity of third-party testing for certification, and the need to prioritize critical failures given limited resources. This paper addresses these challenges by proposing three BSN testing approaches: PASTA, ValComb, and TransCov. These approaches share common characteristics, which are described through a general framework called GATE4BSN. PASTA simulates patients with sensors and models sensor trends using a Discrete Time Markov Chain (DTMC). ValComb explores various health conditions by considering all sensor risk level combinations, while TransCov ensures full coverage of DTMC transitions. We empirically evaluate these approaches, comparing them with a baseline approach in terms of failure detection. The results demonstrate that PASTA, ValComb, and TransCov uncover previously undetected failures in an open-source BSN and outperform the baseline approach. Statistical analysis reveals that PASTA is the most effective, while ValComb is 76 times faster than PASTA and nearly as effective.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"223 ","pages":"Article 112336"},"PeriodicalIF":3.7,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143377629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical tree-based algorithms for efficient expression parsing and test sequence generation in software models","authors":"Yihao Li , Pan Liu","doi":"10.1016/j.jss.2025.112354","DOIUrl":"10.1016/j.jss.2025.112354","url":null,"abstract":"<div><div>The software expression model serves as a formalized specification, accurately depicting software behavior and generating test sequences through algebraic operations derived from the model. Typically, automated algebraic manipulation involves constructing an abstract syntax tree (AST) for the expression, followed by traversing it to identify subexpressions. However, this approach introduces a significant amount of redundant algebraic operations, diminishing the efficiency of expression parsing. To address this challenge, this paper introduces HT-EP, an innovative hierarchical tree-based expression parsing algorithm. HT-EP transforms expressions into hierarchical trees, utilizing algebraic operations to process nodes efficiently and generate streamlined test sequences. Compared to ASTs, hierarchical trees exhibit a simplified structure with fewer nodes, enabling faster traversal. Our experiment involved 124 expressions from scholarly papers over the past six decades and core functional expressions from 15 open-source software projects. The goal was to assess the parsing and fault detection capabilities of HT-EP against four other expression parsing algorithms. Additionally, we compared the complexities of hierarchical trees and ASTs, exploring factors influencing hierarchical tree complexity. Experimental results reveal that the HT-EP algorithm excels in parsing and software fault detection capabilities compared to the other four algorithms. Furthermore, for expressions derived from real-world cases, HT-EP achieves an approximate 40% reduction in redundant algebraic operation steps and an average 63% reduction in runtime compared to AST-EP.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"223 ","pages":"Article 112354"},"PeriodicalIF":3.7,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143183248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introduction to the Special Section on software engineering for hybrid quantum computing systems","authors":"Paolo Arcaini, Andriy Miranskyy, Hausi Müller","doi":"10.1016/j.jss.2025.112362","DOIUrl":"10.1016/j.jss.2025.112362","url":null,"abstract":"","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"223 ","pages":"Article 112362"},"PeriodicalIF":3.7,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143463321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elijah Zolduoarrati , Sherlock A. Licorish , John Grundy
{"title":"A cross-continental analysis of how regional cues shape top stack overflow contributors","authors":"Elijah Zolduoarrati , Sherlock A. Licorish , John Grundy","doi":"10.1016/j.jss.2025.112338","DOIUrl":"10.1016/j.jss.2025.112338","url":null,"abstract":"<div><div>Stack Overflow offers valuable knowledge for software developers, but studies suggest digital information tends to cluster geographically, limiting access to necessary knowledge for innovation. This study explores posts of top contributors on Stack Overflow across the United States, Brazil, India, Egypt, the United Kingdom, and Australia. We analyse platform activities, conduct social network analysis, employ topic modelling paired with thematic analysis, before dissecting their knowledge sharing patterns via directed content analysis. Results indicate that cultural factors, entrepreneurial activities, tech ecosystem maturity, as well as workforce diversity in a region were found to shape how top contributors contribute. For instance, individualistic users communicate directly whilst collectivistic users prefer subtle communication and socio-emotional cues. Moreover, top contributors in nascent technology ecosystems were more likely to discuss fundamental concepts, while those in mature ecosystems focus on specialised niches. This study sheds light on how diversity in human aspects may influence the dynamics of CQA settings, where future researchers can explicate the extent of which latent contextual factors affect user contributions and community structure.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"223 ","pages":"Article 112338"},"PeriodicalIF":3.7,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143277208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}