{"title":"Introduction to the Special Section on software engineering for hybrid quantum computing systems","authors":"Paolo Arcaini, Andriy Miranskyy, Hausi Müller","doi":"10.1016/j.jss.2025.112362","DOIUrl":"10.1016/j.jss.2025.112362","url":null,"abstract":"","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"223 ","pages":"Article 112362"},"PeriodicalIF":3.7,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143463321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elijah Zolduoarrati , Sherlock A. Licorish , John Grundy
{"title":"A cross-continental analysis of how regional cues shape top stack overflow contributors","authors":"Elijah Zolduoarrati , Sherlock A. Licorish , John Grundy","doi":"10.1016/j.jss.2025.112338","DOIUrl":"10.1016/j.jss.2025.112338","url":null,"abstract":"<div><div>Stack Overflow offers valuable knowledge for software developers, but studies suggest digital information tends to cluster geographically, limiting access to necessary knowledge for innovation. This study explores posts of top contributors on Stack Overflow across the United States, Brazil, India, Egypt, the United Kingdom, and Australia. We analyse platform activities, conduct social network analysis, employ topic modelling paired with thematic analysis, before dissecting their knowledge sharing patterns via directed content analysis. Results indicate that cultural factors, entrepreneurial activities, tech ecosystem maturity, as well as workforce diversity in a region were found to shape how top contributors contribute. For instance, individualistic users communicate directly whilst collectivistic users prefer subtle communication and socio-emotional cues. Moreover, top contributors in nascent technology ecosystems were more likely to discuss fundamental concepts, while those in mature ecosystems focus on specialised niches. This study sheds light on how diversity in human aspects may influence the dynamics of CQA settings, where future researchers can explicate the extent of which latent contextual factors affect user contributions and community structure.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"223 ","pages":"Article 112338"},"PeriodicalIF":3.7,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143277208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Donghui Gao , Changjian Liu , Ningjiang Chen , Xiaochun Hu
{"title":"LogGzip: Towards log Parsing with lossless compression","authors":"Donghui Gao , Changjian Liu , Ningjiang Chen , Xiaochun Hu","doi":"10.1016/j.jss.2025.112349","DOIUrl":"10.1016/j.jss.2025.112349","url":null,"abstract":"<div><div>Automated analysis of complex logs from Internet of Things(IoT) devices facilitates failure diagnosis and system status monitoring. Log parsing, the first step in this process, converts raw logs into structured data. Due to the vast size and intricate structure of IoT system logs, parsers must effectively handle various log formats. Supervised learning parsers require labor-intensive manual data labeling. Clustering-based parsers, as an unsupervised method, minimize expert involvement and manual annotation. However, existing clustering-based parsers struggle with the diverse formats of log data and handling minor variations or noise within logs, due to their reliance on specific log structures or the need to transform logs into particular representations. To address the above problems, the paper proposes LogGzip, a clustering log parser based on the gzip lossless compressor. It employs a gzip compressor to measure differences in compressed lengths between logs to identify the complex patterns and regularities in the logs, and designs compression distance calculation method to construct a distance matrix as a measure of log event similarity. At the same time, the overhead in the compression process is reduced by building a compression dictionary. Finally, clustering analysis is performed using the similarity scores. Experimental results demonstrate that the parsing accuracy of LogGzip outperforms the existing state-of-the-art log parsers.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"223 ","pages":"Article 112349"},"PeriodicalIF":3.7,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143378432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianzhang Zhang , Jialong Zhou , Jinping Hua , Nan Niu , Chuang Liu
{"title":"Mining user privacy concern topics from app reviews","authors":"Jianzhang Zhang , Jialong Zhou , Jinping Hua , Nan Niu , Chuang Liu","doi":"10.1016/j.jss.2025.112355","DOIUrl":"10.1016/j.jss.2025.112355","url":null,"abstract":"<div><h3>Context:</h3><div>As mobile applications (apps) widely spread throughout our society and daily life, various personal information is constantly demanded by apps in exchange for more intelligent and customized functionality. An increasing number of users are voicing their privacy concerns through app reviews on app stores.</div></div><div><h3>Objective:</h3><div>The main challenge of effectively mining privacy concerns from user reviews lies in that reviews expressing privacy concerns are overridden by a large number of reviews expressing more generic themes and noisy content. In this work, we propose a novel automated approach to overcome that challenge.</div></div><div><h3>Method:</h3><div>Our approach first employs information retrieval and document embeddings to extract candidate privacy reviews in an unsupervised manner, which are further labeled to prepare the annotation dataset. Then, supervised classifiers are trained to automatically identify privacy reviews. Finally, an interpretable topic mining algorithm is designed to detect privacy concern topics contained in the privacy reviews.</div></div><div><h3>Results:</h3><div>Experimental results show that the best performing document embedding achieves an average precision of 96.80% in the top 100 retrieved candidate privacy reviews, outperforming the taxonomy-based baseline, which achieves 73.87%. All trained privacy review classifiers achieve an <span><math><msub><mrow><mi>F</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> score above 91%, surpassing the keyword-matching baseline by as much as 7.5% and the large language model baseline by up to 2.74%. For detecting privacy concern topics from privacy reviews, our proposed algorithm achieves both better topic coherence and topic diversity than three strong topic modeling baselines, including LDA.</div></div><div><h3>Conclusion:</h3><div>Empirical evaluation results demonstrate the effectiveness of our approach in identifying privacy reviews and detecting user privacy concerns in app reviews.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112355"},"PeriodicalIF":3.7,"publicationDate":"2025-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oluwafemi Odu , Alvine B. Belle , Song Wang , Segla Kpodjedo , Timothy C. Lethbridge , Hadi Hemmati
{"title":"Automatic instantiation of assurance cases from patterns using large language models","authors":"Oluwafemi Odu , Alvine B. Belle , Song Wang , Segla Kpodjedo , Timothy C. Lethbridge , Hadi Hemmati","doi":"10.1016/j.jss.2025.112353","DOIUrl":"10.1016/j.jss.2025.112353","url":null,"abstract":"<div><div>An assurance case is a structured set of arguments supported by evidence, demonstrating that a system’s non-functional requirements (e.g., safety, security, reliability) have been correctly implemented. Assurance case patterns serve as templates derived from previous successful assurance cases, aimed at facilitating the creation of new assurance cases. Despite using these patterns to generate assurance cases, their instantiation remains a largely manual and error-prone process that heavily relies on domain expertise. Thus, exploring techniques to support their automatic instantiation becomes crucial. This study aims to investigate the potential of Large Language Models (LLMs) in automating the generation of assurance cases that comply with specific patterns. Specifically, we formalize assurance case patterns using predicate-based rules and then utilize LLMs, i.e., GPT-4o and GPT-4 Turbo, to automatically instantiate assurance cases from these formalized patterns. Our findings suggest that LLMs can generate assurance cases that comply with the given patterns. However, this study also highlights that LLMs may struggle with understanding some nuances related to pattern-specific relationships. While LLMs exhibit potential in the automatic generation of assurance cases, their capabilities still fall short compared to human experts. Therefore, a semi-automatic approach to instantiating assurance cases may be more practical at this time.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112353"},"PeriodicalIF":3.7,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Zheng , Chang Liu , Peiran Deng , Xiang Chen , Xiaoxue Wu
{"title":"Enhancing concurrency vulnerability detection through AST-based static fuzz mutation","authors":"Wei Zheng , Chang Liu , Peiran Deng , Xiang Chen , Xiaoxue Wu","doi":"10.1016/j.jss.2025.112352","DOIUrl":"10.1016/j.jss.2025.112352","url":null,"abstract":"<div><div>As multi-threaded and highly concurrent programs are increasingly used, their inherent uncertainty significantly impacts program stability. Traditional testing methods often struggle to effectively detect specific concurrency vulnerabilities because these vulnerabilities are triggered only under particular circumstances, making detection at the vulnerability-triggering level challenging. In view of this, we propose a static fuzz mutation testing method based on Abstract Syntax Tree (AST). This method leverages the fine-grained granularity of ASTs to optimize test suites for concurrency vulnerabilities detection. Initially, We analyze and classify the concurrency vulnerabilities found in Go source code, and generate vulnerability feature mutation operators (mutation operators with concurrency vulnerability feature). Next, we propose static fuzz mutation method and heuristic algorithms at the AST level and apply them to mutators. Ultimately, we screened over 200 code slices from 8 open-source projects for static fuzz mutation testing. The results indicate that introducing vulnerability feature mutation operators improved the number of mutant by approximately 22.15% across various types of concurrency vulnerability samples. This enhancement elevated the probability of triggering concurrency vulnerabilities in the program. After incorporating data from the Go language standard library, experiments further confirmed that our proposed static fuzz mutation testing method can effectively improve the accuracy of concurrency vulnerability detection.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112352"},"PeriodicalIF":3.7,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Auri M.R. Vincenzi , Pedro H. Kuroishi , João Bispo , Ana R.C. da Veiga , David R.C. da Mata , Francisco B. Azevedo , Ana C.R. Paiva
{"title":"METFORD – Mutation tEsTing Framework fOR anDroid","authors":"Auri M.R. Vincenzi , Pedro H. Kuroishi , João Bispo , Ana R.C. da Veiga , David R.C. da Mata , Francisco B. Azevedo , Ana C.R. Paiva","doi":"10.1016/j.jss.2024.112332","DOIUrl":"10.1016/j.jss.2024.112332","url":null,"abstract":"<div><div>Mutation testing may be used to guide test case generation and as a technique to assess the quality of test suites. Despite being used frequently, mutation testing is not so commonly applied in the mobile world. One critical challenge in mutation testing is dealing with its computational cost. Generating mutants, running test cases over each mutant, and analyzing the results may require significant time and resources. This research aims to contribute to reducing Android mutation testing costs. It implements mutation testing operators (traditional and Android-specific) according to mutant schemata (implementing multiple mutants into a single code file). It also describes an Android mutation testing framework developed to execute test cases and determine mutation scores. Additional mutation operators can be implemented in JavaScript and easily integrated into the framework. The overall approach is validated through case studies showing that mutant schemata have advantages over the traditional mutation strategy (one file per mutant). The results show mutant schemata overcome traditional mutation in all evaluated aspects with no additional cost: it takes 8.50% less time for mutant generation, requires 99.78% less disk space, and runs, on average, 6.45% faster than traditional mutation. Moreover, considering sustainability metrics, mutant schemata have 8,18% less carbon footprint than traditional strategy.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112332"},"PeriodicalIF":3.7,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143104072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenwei Lan , Chen Huang , Tingting Yu , Li Li , Zhanqi Cui
{"title":"BaSFuzz: Fuzz testing based on difference analysis for seed bytes","authors":"Wenwei Lan , Chen Huang , Tingting Yu , Li Li , Zhanqi Cui","doi":"10.1016/j.jss.2025.112340","DOIUrl":"10.1016/j.jss.2025.112340","url":null,"abstract":"<div><div>Coverage-guided Greybox Fuzzing (CGF) is one of the most effective dynamic software testing techniques, which focus on improving the code coverage. The methodology automatically generates new offspring test cases by mutating existing test cases and analyzing program execution, preserving the interesting test cases as seeds for subsequent mutations. However, existing CGF tools often neglect the similarity between seeds. The mutation of similar seeds can yield a multitude of similar offspring test cases, subsequently executing similar code segments of the program under test. This challenge hinders the improvement of code coverage, consequently impacting the efficiency of fuzz testing.</div><div>To address this issue, this paper proposes a fuzz testing method BaSFuzz based on difference analysis for seed bytes. The method leverages both byte similarity and structure similarity to analyze the differences between seed bytes. Subsequently, it computes a similarity score for each seed and reorders the seed queue in ascending order of similarity scores.</div><div>Based on this method, a prototype tool is developed and compared with AFL, AFLFast, MOpt, AFL++-Hier and HTFuzz on 12 target programs. The experimental results indicate that BaSFuzz achieved 190.14%, 143.9%, 10.93%, 374.85% and 11.79% more edge coverage compared to the five tools, respectively. Additionally, BaSFuzz triggered unique crashes 3.57 times, 1.46 times, 42.38%, 2.85 times, and 33.44% more than the five tools, respectively.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112340"},"PeriodicalIF":3.7,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143104115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Riccardo Coppola, Tommaso Fulcini, Luca Ardito, Marco Torchiano
{"title":"Kotlin assimilating the Android ecosystem: An appraisal of diffusion and impact on maintainability","authors":"Riccardo Coppola, Tommaso Fulcini, Luca Ardito, Marco Torchiano","doi":"10.1016/j.jss.2025.112346","DOIUrl":"10.1016/j.jss.2025.112346","url":null,"abstract":"<div><div>Kotlin was introduced in 2011 as an alternative to the Java programming language, promising to address many of its predecessor’s limitations and positioning itself as a better option for application maintainability. In 2017, Kotlin became a first-class language for Android application development, complete with extensive tool support.</div><div>This paper aims to empirically assess the diffusion of Kotlin in developing Android applications and to investigate the impact of Kotlin adoption on application maintainability.</div><div>We mined 2708 open-source Android applications from F-Droid, focusing on the extent of Kotlin code presence, their popularity, and maintainability. This analysis adopted a set of six code metrics proxies.</div><div>The proportion of applications developed with Kotlin, either in conjunction with Java or exclusively, has continuously increased over the past five years. Currently, Kotlin is used in approximately 40% of the projects. The adoption of Kotlin in application development appears to be linked to greater popularity among end-users and developers when compared to the applications written in Java. Notably, the exclusive use of Kotlin in projects significantly enhances all the considered code maintainability metrics.</div><div>We conclude that Kotlin is rapidly gaining ground in the Android ecosystem. This trend is likely due to Kotlin’s fulfilment of its promise as a superior alternative to Java, particularly in terms of maintainability.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112346"},"PeriodicalIF":3.7,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xueyin Wei , Jing Li , Xudong He , Weizhou Peng , Ying Zhu , Rongbin Gu , Yunlong Zhu , Jun Huang
{"title":"Extracting microservices from monolithic applications using consistent graph enhanced Graph Transformer","authors":"Xueyin Wei , Jing Li , Xudong He , Weizhou Peng , Ying Zhu , Rongbin Gu , Yunlong Zhu , Jun Huang","doi":"10.1016/j.jss.2025.112345","DOIUrl":"10.1016/j.jss.2025.112345","url":null,"abstract":"<div><div>With the continuous development of cloud computing, the advantages of microservice architecture have become increasingly obvious compared with monolithic programs. Consequently, numerous firms are engaged in research aimed at transitioning legacy monolithic applications to microservices, thereby enabling them to maximize the advantages of cloud-based deployment. To solve the problem that manual decomposition of microservices is both time-consuming and labor-intensive, while traditional automated microservice decomposition methods cannot effectively integrate the rich structural and semantic information of single programs, we propose a microservice extraction method based on consistent graph clustering (GC-VCG). Initially, a static analysis strategy is employed to extract dependencies between classes within the monolithic application, as well as the textual information utilized in the class creation process, to construct both the structural and semantic views. Subsequently, a consistent graph enhanced Graph Transformer is utilized to learn a unified graph from both structural and semantic views. Lastly, the k-means clustering algorithm is applied to cluster the nodes, thereby identifying candidate microservices. To verify the effectiveness of GC-VCG, this paper compares it with multiple baseline methods on four publicly available monolithic applications. The results show the effectiveness of GC-VCG.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112345"},"PeriodicalIF":3.7,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}