Bangchao Wang , Yang Deng , Ruiqi Luo , Peng Liang , Tingting Bi
{"title":"MPLinker: Multi-template Prompt-tuning with adversarial training for Issue–commit Link recovery","authors":"Bangchao Wang , Yang Deng , Ruiqi Luo , Peng Liang , Tingting Bi","doi":"10.1016/j.jss.2025.112351","DOIUrl":"10.1016/j.jss.2025.112351","url":null,"abstract":"<div><div>In recent years, the pre-training, prompting and prediction paradigm, known as prompt-tuning, has achieved significant success in Natural Language Processing (NLP). Issue–commit Link Recovery (ILR) in Software Traceability (ST) plays an important role in improving the reliability, quality, and security of software systems. The current ILR methods convert the ILR into a classification task using pre-trained language models (PLMs) and dedicated neural networks. These methods do not fully utilize the semantic information embedded in PLMs, failing to achieve acceptable performance. To address this limitation, we introduce a novel paradigm: <strong>Multi-template Prompt-tuning</strong> with adversarial training for issue–commit <strong>Link</strong> recovery (MPLinker). MPLinker redefines the ILR task as a cloze task via template-based prompt-tuning and incorporates adversarial training to enhance model generalization and reduce overfitting. We evaluated MPLinker on six open-source projects using a comprehensive set of performance metrics. The experiment results demonstrate that MPLinker achieves an average F1-score of 96.10%, Precision of 96.49%, Recall of 95.92%, MCC of 94.04%, AUC of 96.05%, and ACC of 98.15%, significantly outperforming existing state-of-the-art methods. Overall, MPLinker improves the performance and generalization of ILR models and introduces innovative concepts and methods for ILR. The replication package for MPLinker is available at <span><span>https://github.com/WTU-intelligent-software-development/MPLinker</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"223 ","pages":"Article 112351"},"PeriodicalIF":3.7,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143350866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Different approaches for testing body sensor network applications","authors":"Samira Silva , Ricardo Caldas , Patrizio Pelliccione , Antonia Bertolino","doi":"10.1016/j.jss.2025.112336","DOIUrl":"10.1016/j.jss.2025.112336","url":null,"abstract":"<div><div>Body Sensor Networks (BSNs) offer a cost-effective way to monitor patients’ health and detect potential risks. Despite the growing interest attracted by BSNs, there is a lack of testing approaches for them. Testing a Body Sensor Network (BSN) is challenging due to its evolving nature, the complexity of sensor scenarios and their fusion, the potential necessity of third-party testing for certification, and the need to prioritize critical failures given limited resources. This paper addresses these challenges by proposing three BSN testing approaches: PASTA, ValComb, and TransCov. These approaches share common characteristics, which are described through a general framework called GATE4BSN. PASTA simulates patients with sensors and models sensor trends using a Discrete Time Markov Chain (DTMC). ValComb explores various health conditions by considering all sensor risk level combinations, while TransCov ensures full coverage of DTMC transitions. We empirically evaluate these approaches, comparing them with a baseline approach in terms of failure detection. The results demonstrate that PASTA, ValComb, and TransCov uncover previously undetected failures in an open-source BSN and outperform the baseline approach. Statistical analysis reveals that PASTA is the most effective, while ValComb is 76 times faster than PASTA and nearly as effective.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"223 ","pages":"Article 112336"},"PeriodicalIF":3.7,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143377629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical tree-based algorithms for efficient expression parsing and test sequence generation in software models","authors":"Yihao Li , Pan Liu","doi":"10.1016/j.jss.2025.112354","DOIUrl":"10.1016/j.jss.2025.112354","url":null,"abstract":"<div><div>The software expression model serves as a formalized specification, accurately depicting software behavior and generating test sequences through algebraic operations derived from the model. Typically, automated algebraic manipulation involves constructing an abstract syntax tree (AST) for the expression, followed by traversing it to identify subexpressions. However, this approach introduces a significant amount of redundant algebraic operations, diminishing the efficiency of expression parsing. To address this challenge, this paper introduces HT-EP, an innovative hierarchical tree-based expression parsing algorithm. HT-EP transforms expressions into hierarchical trees, utilizing algebraic operations to process nodes efficiently and generate streamlined test sequences. Compared to ASTs, hierarchical trees exhibit a simplified structure with fewer nodes, enabling faster traversal. Our experiment involved 124 expressions from scholarly papers over the past six decades and core functional expressions from 15 open-source software projects. The goal was to assess the parsing and fault detection capabilities of HT-EP against four other expression parsing algorithms. Additionally, we compared the complexities of hierarchical trees and ASTs, exploring factors influencing hierarchical tree complexity. Experimental results reveal that the HT-EP algorithm excels in parsing and software fault detection capabilities compared to the other four algorithms. Furthermore, for expressions derived from real-world cases, HT-EP achieves an approximate 40% reduction in redundant algebraic operation steps and an average 63% reduction in runtime compared to AST-EP.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"223 ","pages":"Article 112354"},"PeriodicalIF":3.7,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143183248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introduction to the Special Section on software engineering for hybrid quantum computing systems","authors":"Paolo Arcaini, Andriy Miranskyy, Hausi Müller","doi":"10.1016/j.jss.2025.112362","DOIUrl":"10.1016/j.jss.2025.112362","url":null,"abstract":"","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"223 ","pages":"Article 112362"},"PeriodicalIF":3.7,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143463321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elijah Zolduoarrati , Sherlock A. Licorish , John Grundy
{"title":"A cross-continental analysis of how regional cues shape top stack overflow contributors","authors":"Elijah Zolduoarrati , Sherlock A. Licorish , John Grundy","doi":"10.1016/j.jss.2025.112338","DOIUrl":"10.1016/j.jss.2025.112338","url":null,"abstract":"<div><div>Stack Overflow offers valuable knowledge for software developers, but studies suggest digital information tends to cluster geographically, limiting access to necessary knowledge for innovation. This study explores posts of top contributors on Stack Overflow across the United States, Brazil, India, Egypt, the United Kingdom, and Australia. We analyse platform activities, conduct social network analysis, employ topic modelling paired with thematic analysis, before dissecting their knowledge sharing patterns via directed content analysis. Results indicate that cultural factors, entrepreneurial activities, tech ecosystem maturity, as well as workforce diversity in a region were found to shape how top contributors contribute. For instance, individualistic users communicate directly whilst collectivistic users prefer subtle communication and socio-emotional cues. Moreover, top contributors in nascent technology ecosystems were more likely to discuss fundamental concepts, while those in mature ecosystems focus on specialised niches. This study sheds light on how diversity in human aspects may influence the dynamics of CQA settings, where future researchers can explicate the extent of which latent contextual factors affect user contributions and community structure.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"223 ","pages":"Article 112338"},"PeriodicalIF":3.7,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143277208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Donghui Gao , Changjian Liu , Ningjiang Chen , Xiaochun Hu
{"title":"LogGzip: Towards log Parsing with lossless compression","authors":"Donghui Gao , Changjian Liu , Ningjiang Chen , Xiaochun Hu","doi":"10.1016/j.jss.2025.112349","DOIUrl":"10.1016/j.jss.2025.112349","url":null,"abstract":"<div><div>Automated analysis of complex logs from Internet of Things(IoT) devices facilitates failure diagnosis and system status monitoring. Log parsing, the first step in this process, converts raw logs into structured data. Due to the vast size and intricate structure of IoT system logs, parsers must effectively handle various log formats. Supervised learning parsers require labor-intensive manual data labeling. Clustering-based parsers, as an unsupervised method, minimize expert involvement and manual annotation. However, existing clustering-based parsers struggle with the diverse formats of log data and handling minor variations or noise within logs, due to their reliance on specific log structures or the need to transform logs into particular representations. To address the above problems, the paper proposes LogGzip, a clustering log parser based on the gzip lossless compressor. It employs a gzip compressor to measure differences in compressed lengths between logs to identify the complex patterns and regularities in the logs, and designs compression distance calculation method to construct a distance matrix as a measure of log event similarity. At the same time, the overhead in the compression process is reduced by building a compression dictionary. Finally, clustering analysis is performed using the similarity scores. Experimental results demonstrate that the parsing accuracy of LogGzip outperforms the existing state-of-the-art log parsers.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"223 ","pages":"Article 112349"},"PeriodicalIF":3.7,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143378432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianzhang Zhang , Jialong Zhou , Jinping Hua , Nan Niu , Chuang Liu
{"title":"Mining user privacy concern topics from app reviews","authors":"Jianzhang Zhang , Jialong Zhou , Jinping Hua , Nan Niu , Chuang Liu","doi":"10.1016/j.jss.2025.112355","DOIUrl":"10.1016/j.jss.2025.112355","url":null,"abstract":"<div><h3>Context:</h3><div>As mobile applications (apps) widely spread throughout our society and daily life, various personal information is constantly demanded by apps in exchange for more intelligent and customized functionality. An increasing number of users are voicing their privacy concerns through app reviews on app stores.</div></div><div><h3>Objective:</h3><div>The main challenge of effectively mining privacy concerns from user reviews lies in that reviews expressing privacy concerns are overridden by a large number of reviews expressing more generic themes and noisy content. In this work, we propose a novel automated approach to overcome that challenge.</div></div><div><h3>Method:</h3><div>Our approach first employs information retrieval and document embeddings to extract candidate privacy reviews in an unsupervised manner, which are further labeled to prepare the annotation dataset. Then, supervised classifiers are trained to automatically identify privacy reviews. Finally, an interpretable topic mining algorithm is designed to detect privacy concern topics contained in the privacy reviews.</div></div><div><h3>Results:</h3><div>Experimental results show that the best performing document embedding achieves an average precision of 96.80% in the top 100 retrieved candidate privacy reviews, outperforming the taxonomy-based baseline, which achieves 73.87%. All trained privacy review classifiers achieve an <span><math><msub><mrow><mi>F</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> score above 91%, surpassing the keyword-matching baseline by as much as 7.5% and the large language model baseline by up to 2.74%. For detecting privacy concern topics from privacy reviews, our proposed algorithm achieves both better topic coherence and topic diversity than three strong topic modeling baselines, including LDA.</div></div><div><h3>Conclusion:</h3><div>Empirical evaluation results demonstrate the effectiveness of our approach in identifying privacy reviews and detecting user privacy concerns in app reviews.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112355"},"PeriodicalIF":3.7,"publicationDate":"2025-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oluwafemi Odu , Alvine B. Belle , Song Wang , Segla Kpodjedo , Timothy C. Lethbridge , Hadi Hemmati
{"title":"Automatic instantiation of assurance cases from patterns using large language models","authors":"Oluwafemi Odu , Alvine B. Belle , Song Wang , Segla Kpodjedo , Timothy C. Lethbridge , Hadi Hemmati","doi":"10.1016/j.jss.2025.112353","DOIUrl":"10.1016/j.jss.2025.112353","url":null,"abstract":"<div><div>An assurance case is a structured set of arguments supported by evidence, demonstrating that a system’s non-functional requirements (e.g., safety, security, reliability) have been correctly implemented. Assurance case patterns serve as templates derived from previous successful assurance cases, aimed at facilitating the creation of new assurance cases. Despite using these patterns to generate assurance cases, their instantiation remains a largely manual and error-prone process that heavily relies on domain expertise. Thus, exploring techniques to support their automatic instantiation becomes crucial. This study aims to investigate the potential of Large Language Models (LLMs) in automating the generation of assurance cases that comply with specific patterns. Specifically, we formalize assurance case patterns using predicate-based rules and then utilize LLMs, i.e., GPT-4o and GPT-4 Turbo, to automatically instantiate assurance cases from these formalized patterns. Our findings suggest that LLMs can generate assurance cases that comply with the given patterns. However, this study also highlights that LLMs may struggle with understanding some nuances related to pattern-specific relationships. While LLMs exhibit potential in the automatic generation of assurance cases, their capabilities still fall short compared to human experts. Therefore, a semi-automatic approach to instantiating assurance cases may be more practical at this time.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112353"},"PeriodicalIF":3.7,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Auri M.R. Vincenzi , Pedro H. Kuroishi , João Bispo , Ana R.C. da Veiga , David R.C. da Mata , Francisco B. Azevedo , Ana C.R. Paiva
{"title":"METFORD – Mutation tEsTing Framework fOR anDroid","authors":"Auri M.R. Vincenzi , Pedro H. Kuroishi , João Bispo , Ana R.C. da Veiga , David R.C. da Mata , Francisco B. Azevedo , Ana C.R. Paiva","doi":"10.1016/j.jss.2024.112332","DOIUrl":"10.1016/j.jss.2024.112332","url":null,"abstract":"<div><div>Mutation testing may be used to guide test case generation and as a technique to assess the quality of test suites. Despite being used frequently, mutation testing is not so commonly applied in the mobile world. One critical challenge in mutation testing is dealing with its computational cost. Generating mutants, running test cases over each mutant, and analyzing the results may require significant time and resources. This research aims to contribute to reducing Android mutation testing costs. It implements mutation testing operators (traditional and Android-specific) according to mutant schemata (implementing multiple mutants into a single code file). It also describes an Android mutation testing framework developed to execute test cases and determine mutation scores. Additional mutation operators can be implemented in JavaScript and easily integrated into the framework. The overall approach is validated through case studies showing that mutant schemata have advantages over the traditional mutation strategy (one file per mutant). The results show mutant schemata overcome traditional mutation in all evaluated aspects with no additional cost: it takes 8.50% less time for mutant generation, requires 99.78% less disk space, and runs, on average, 6.45% faster than traditional mutation. Moreover, considering sustainability metrics, mutant schemata have 8,18% less carbon footprint than traditional strategy.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112332"},"PeriodicalIF":3.7,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143104072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Zheng , Chang Liu , Peiran Deng , Xiang Chen , Xiaoxue Wu
{"title":"Enhancing concurrency vulnerability detection through AST-based static fuzz mutation","authors":"Wei Zheng , Chang Liu , Peiran Deng , Xiang Chen , Xiaoxue Wu","doi":"10.1016/j.jss.2025.112352","DOIUrl":"10.1016/j.jss.2025.112352","url":null,"abstract":"<div><div>As multi-threaded and highly concurrent programs are increasingly used, their inherent uncertainty significantly impacts program stability. Traditional testing methods often struggle to effectively detect specific concurrency vulnerabilities because these vulnerabilities are triggered only under particular circumstances, making detection at the vulnerability-triggering level challenging. In view of this, we propose a static fuzz mutation testing method based on Abstract Syntax Tree (AST). This method leverages the fine-grained granularity of ASTs to optimize test suites for concurrency vulnerabilities detection. Initially, We analyze and classify the concurrency vulnerabilities found in Go source code, and generate vulnerability feature mutation operators (mutation operators with concurrency vulnerability feature). Next, we propose static fuzz mutation method and heuristic algorithms at the AST level and apply them to mutators. Ultimately, we screened over 200 code slices from 8 open-source projects for static fuzz mutation testing. The results indicate that introducing vulnerability feature mutation operators improved the number of mutant by approximately 22.15% across various types of concurrency vulnerability samples. This enhancement elevated the probability of triggering concurrency vulnerabilities in the program. After incorporating data from the Go language standard library, experiments further confirmed that our proposed static fuzz mutation testing method can effectively improve the accuracy of concurrency vulnerability detection.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112352"},"PeriodicalIF":3.7,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}