Jianzhang Zhang , Jialong Zhou , Jinping Hua , Nan Niu , Chuang Liu
{"title":"Mining user privacy concern topics from app reviews","authors":"Jianzhang Zhang , Jialong Zhou , Jinping Hua , Nan Niu , Chuang Liu","doi":"10.1016/j.jss.2025.112355","DOIUrl":"10.1016/j.jss.2025.112355","url":null,"abstract":"<div><h3>Context:</h3><div>As mobile applications (apps) widely spread throughout our society and daily life, various personal information is constantly demanded by apps in exchange for more intelligent and customized functionality. An increasing number of users are voicing their privacy concerns through app reviews on app stores.</div></div><div><h3>Objective:</h3><div>The main challenge of effectively mining privacy concerns from user reviews lies in that reviews expressing privacy concerns are overridden by a large number of reviews expressing more generic themes and noisy content. In this work, we propose a novel automated approach to overcome that challenge.</div></div><div><h3>Method:</h3><div>Our approach first employs information retrieval and document embeddings to extract candidate privacy reviews in an unsupervised manner, which are further labeled to prepare the annotation dataset. Then, supervised classifiers are trained to automatically identify privacy reviews. Finally, an interpretable topic mining algorithm is designed to detect privacy concern topics contained in the privacy reviews.</div></div><div><h3>Results:</h3><div>Experimental results show that the best performing document embedding achieves an average precision of 96.80% in the top 100 retrieved candidate privacy reviews, outperforming the taxonomy-based baseline, which achieves 73.87%. All trained privacy review classifiers achieve an <span><math><msub><mrow><mi>F</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> score above 91%, surpassing the keyword-matching baseline by as much as 7.5% and the large language model baseline by up to 2.74%. For detecting privacy concern topics from privacy reviews, our proposed algorithm achieves both better topic coherence and topic diversity than three strong topic modeling baselines, including LDA.</div></div><div><h3>Conclusion:</h3><div>Empirical evaluation results demonstrate the effectiveness of our approach in identifying privacy reviews and detecting user privacy concerns in app reviews.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112355"},"PeriodicalIF":3.7,"publicationDate":"2025-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oluwafemi Odu , Alvine B. Belle , Song Wang , Segla Kpodjedo , Timothy C. Lethbridge , Hadi Hemmati
{"title":"Automatic instantiation of assurance cases from patterns using large language models","authors":"Oluwafemi Odu , Alvine B. Belle , Song Wang , Segla Kpodjedo , Timothy C. Lethbridge , Hadi Hemmati","doi":"10.1016/j.jss.2025.112353","DOIUrl":"10.1016/j.jss.2025.112353","url":null,"abstract":"<div><div>An assurance case is a structured set of arguments supported by evidence, demonstrating that a system’s non-functional requirements (e.g., safety, security, reliability) have been correctly implemented. Assurance case patterns serve as templates derived from previous successful assurance cases, aimed at facilitating the creation of new assurance cases. Despite using these patterns to generate assurance cases, their instantiation remains a largely manual and error-prone process that heavily relies on domain expertise. Thus, exploring techniques to support their automatic instantiation becomes crucial. This study aims to investigate the potential of Large Language Models (LLMs) in automating the generation of assurance cases that comply with specific patterns. Specifically, we formalize assurance case patterns using predicate-based rules and then utilize LLMs, i.e., GPT-4o and GPT-4 Turbo, to automatically instantiate assurance cases from these formalized patterns. Our findings suggest that LLMs can generate assurance cases that comply with the given patterns. However, this study also highlights that LLMs may struggle with understanding some nuances related to pattern-specific relationships. While LLMs exhibit potential in the automatic generation of assurance cases, their capabilities still fall short compared to human experts. Therefore, a semi-automatic approach to instantiating assurance cases may be more practical at this time.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112353"},"PeriodicalIF":3.7,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Auri M.R. Vincenzi , Pedro H. Kuroishi , João Bispo , Ana R.C. da Veiga , David R.C. da Mata , Francisco B. Azevedo , Ana C.R. Paiva
{"title":"METFORD – Mutation tEsTing Framework fOR anDroid","authors":"Auri M.R. Vincenzi , Pedro H. Kuroishi , João Bispo , Ana R.C. da Veiga , David R.C. da Mata , Francisco B. Azevedo , Ana C.R. Paiva","doi":"10.1016/j.jss.2024.112332","DOIUrl":"10.1016/j.jss.2024.112332","url":null,"abstract":"<div><div>Mutation testing may be used to guide test case generation and as a technique to assess the quality of test suites. Despite being used frequently, mutation testing is not so commonly applied in the mobile world. One critical challenge in mutation testing is dealing with its computational cost. Generating mutants, running test cases over each mutant, and analyzing the results may require significant time and resources. This research aims to contribute to reducing Android mutation testing costs. It implements mutation testing operators (traditional and Android-specific) according to mutant schemata (implementing multiple mutants into a single code file). It also describes an Android mutation testing framework developed to execute test cases and determine mutation scores. Additional mutation operators can be implemented in JavaScript and easily integrated into the framework. The overall approach is validated through case studies showing that mutant schemata have advantages over the traditional mutation strategy (one file per mutant). The results show mutant schemata overcome traditional mutation in all evaluated aspects with no additional cost: it takes 8.50% less time for mutant generation, requires 99.78% less disk space, and runs, on average, 6.45% faster than traditional mutation. Moreover, considering sustainability metrics, mutant schemata have 8,18% less carbon footprint than traditional strategy.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112332"},"PeriodicalIF":3.7,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143104072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Zheng , Chang Liu , Peiran Deng , Xiang Chen , Xiaoxue Wu
{"title":"Enhancing concurrency vulnerability detection through AST-based static fuzz mutation","authors":"Wei Zheng , Chang Liu , Peiran Deng , Xiang Chen , Xiaoxue Wu","doi":"10.1016/j.jss.2025.112352","DOIUrl":"10.1016/j.jss.2025.112352","url":null,"abstract":"<div><div>As multi-threaded and highly concurrent programs are increasingly used, their inherent uncertainty significantly impacts program stability. Traditional testing methods often struggle to effectively detect specific concurrency vulnerabilities because these vulnerabilities are triggered only under particular circumstances, making detection at the vulnerability-triggering level challenging. In view of this, we propose a static fuzz mutation testing method based on Abstract Syntax Tree (AST). This method leverages the fine-grained granularity of ASTs to optimize test suites for concurrency vulnerabilities detection. Initially, We analyze and classify the concurrency vulnerabilities found in Go source code, and generate vulnerability feature mutation operators (mutation operators with concurrency vulnerability feature). Next, we propose static fuzz mutation method and heuristic algorithms at the AST level and apply them to mutators. Ultimately, we screened over 200 code slices from 8 open-source projects for static fuzz mutation testing. The results indicate that introducing vulnerability feature mutation operators improved the number of mutant by approximately 22.15% across various types of concurrency vulnerability samples. This enhancement elevated the probability of triggering concurrency vulnerabilities in the program. After incorporating data from the Go language standard library, experiments further confirmed that our proposed static fuzz mutation testing method can effectively improve the accuracy of concurrency vulnerability detection.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112352"},"PeriodicalIF":3.7,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenwei Lan , Chen Huang , Tingting Yu , Li Li , Zhanqi Cui
{"title":"BaSFuzz: Fuzz testing based on difference analysis for seed bytes","authors":"Wenwei Lan , Chen Huang , Tingting Yu , Li Li , Zhanqi Cui","doi":"10.1016/j.jss.2025.112340","DOIUrl":"10.1016/j.jss.2025.112340","url":null,"abstract":"<div><div>Coverage-guided Greybox Fuzzing (CGF) is one of the most effective dynamic software testing techniques, which focus on improving the code coverage. The methodology automatically generates new offspring test cases by mutating existing test cases and analyzing program execution, preserving the interesting test cases as seeds for subsequent mutations. However, existing CGF tools often neglect the similarity between seeds. The mutation of similar seeds can yield a multitude of similar offspring test cases, subsequently executing similar code segments of the program under test. This challenge hinders the improvement of code coverage, consequently impacting the efficiency of fuzz testing.</div><div>To address this issue, this paper proposes a fuzz testing method BaSFuzz based on difference analysis for seed bytes. The method leverages both byte similarity and structure similarity to analyze the differences between seed bytes. Subsequently, it computes a similarity score for each seed and reorders the seed queue in ascending order of similarity scores.</div><div>Based on this method, a prototype tool is developed and compared with AFL, AFLFast, MOpt, AFL++-Hier and HTFuzz on 12 target programs. The experimental results indicate that BaSFuzz achieved 190.14%, 143.9%, 10.93%, 374.85% and 11.79% more edge coverage compared to the five tools, respectively. Additionally, BaSFuzz triggered unique crashes 3.57 times, 1.46 times, 42.38%, 2.85 times, and 33.44% more than the five tools, respectively.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112340"},"PeriodicalIF":3.7,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143104115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Riccardo Coppola, Tommaso Fulcini, Luca Ardito, Marco Torchiano
{"title":"Kotlin assimilating the Android ecosystem: An appraisal of diffusion and impact on maintainability","authors":"Riccardo Coppola, Tommaso Fulcini, Luca Ardito, Marco Torchiano","doi":"10.1016/j.jss.2025.112346","DOIUrl":"10.1016/j.jss.2025.112346","url":null,"abstract":"<div><div>Kotlin was introduced in 2011 as an alternative to the Java programming language, promising to address many of its predecessor’s limitations and positioning itself as a better option for application maintainability. In 2017, Kotlin became a first-class language for Android application development, complete with extensive tool support.</div><div>This paper aims to empirically assess the diffusion of Kotlin in developing Android applications and to investigate the impact of Kotlin adoption on application maintainability.</div><div>We mined 2708 open-source Android applications from F-Droid, focusing on the extent of Kotlin code presence, their popularity, and maintainability. This analysis adopted a set of six code metrics proxies.</div><div>The proportion of applications developed with Kotlin, either in conjunction with Java or exclusively, has continuously increased over the past five years. Currently, Kotlin is used in approximately 40% of the projects. The adoption of Kotlin in application development appears to be linked to greater popularity among end-users and developers when compared to the applications written in Java. Notably, the exclusive use of Kotlin in projects significantly enhances all the considered code maintainability metrics.</div><div>We conclude that Kotlin is rapidly gaining ground in the Android ecosystem. This trend is likely due to Kotlin’s fulfilment of its promise as a superior alternative to Java, particularly in terms of maintainability.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112346"},"PeriodicalIF":3.7,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xueyin Wei , Jing Li , Xudong He , Weizhou Peng , Ying Zhu , Rongbin Gu , Yunlong Zhu , Jun Huang
{"title":"Extracting microservices from monolithic applications using consistent graph enhanced Graph Transformer","authors":"Xueyin Wei , Jing Li , Xudong He , Weizhou Peng , Ying Zhu , Rongbin Gu , Yunlong Zhu , Jun Huang","doi":"10.1016/j.jss.2025.112345","DOIUrl":"10.1016/j.jss.2025.112345","url":null,"abstract":"<div><div>With the continuous development of cloud computing, the advantages of microservice architecture have become increasingly obvious compared with monolithic programs. Consequently, numerous firms are engaged in research aimed at transitioning legacy monolithic applications to microservices, thereby enabling them to maximize the advantages of cloud-based deployment. To solve the problem that manual decomposition of microservices is both time-consuming and labor-intensive, while traditional automated microservice decomposition methods cannot effectively integrate the rich structural and semantic information of single programs, we propose a microservice extraction method based on consistent graph clustering (GC-VCG). Initially, a static analysis strategy is employed to extract dependencies between classes within the monolithic application, as well as the textual information utilized in the class creation process, to construct both the structural and semantic views. Subsequently, a consistent graph enhanced Graph Transformer is utilized to learn a unified graph from both structural and semantic views. Lastly, the k-means clustering algorithm is applied to cluster the nodes, thereby identifying candidate microservices. To verify the effectiveness of GC-VCG, this paper compares it with multiple baseline methods on four publicly available monolithic applications. The results show the effectiveness of GC-VCG.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112345"},"PeriodicalIF":3.7,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Evangelos Ntentos , Stephen John Warnett , Uwe Zdun
{"title":"On the understandability of machine learning practices in deep learning and reinforcement learning based systems","authors":"Evangelos Ntentos , Stephen John Warnett , Uwe Zdun","doi":"10.1016/j.jss.2025.112343","DOIUrl":"10.1016/j.jss.2025.112343","url":null,"abstract":"<div><div>Machine learning (ML) has emerged as a transformative subject, using various algorithms to help systems analyze data and make predictions. Deep Learning (DL) uses neural networks to address hard problems. Reinforcement Learning (RL) is a way to solve problems by making consecutive decisions.</div><div>Understanding ML systems based only on the source code is often a challenging task, especially for inexperienced developers. In a controlled experiment involving one hundred fifty-eight participants, we assessed the understandability of ML-based systems and workflows through source code inspection compared to semi-formal representations in models and metrics.</div><div>We hypothesize that ML system diagrams modeling details of ML workflows and practices like transfer learning and checkpoints can enhance the understandability of ML practices in system design comprehension tasks, assessed through task <em>correctness</em>. Additionally, providing these sources could lead to an increase in task <em>duration</em>, and we expect a significant correlation between <em>correctness</em> and <em>duration</em>.</div><div>Our findings show that providing semi-formal ML system diagrams with the source code improves the effectiveness of the <em>correctness</em> for the DL relevant tasks. The control group had an average correctness of 0.7121, while the experimental group had a higher average correctness of 0.7759. On the other hand, participants who received only the system source code showed slightly better performance in the <em>correctness</em> task (average <em>correctness</em> 0.6808) within the RL relevant tasks compared to those who also received the semi-formal diagrams (average <em>correctness</em> of 0.6612). However, no significant difference was found in the <em>duration</em> task between the two. The control group, for the DL relevant tasks, took an average of 1571.62 s, whereas the experimental group took an average of 1763.85 s. For the RL relevant tasks, the control group had an average of 1883.80 s, while the experimental group 1925.46 s. However, semi-formal ML system diagrams can benefit specific scenarios.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112343"},"PeriodicalIF":3.7,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CIPAC: A framework of automated software construction based on collective intelligence","authors":"Jiaxin Liu, Yating Zhang, Yiwei Li, Tiecheng Ma, Wei Dong","doi":"10.1016/j.jss.2025.112335","DOIUrl":"10.1016/j.jss.2025.112335","url":null,"abstract":"<div><div>In software development, constructing programs efficiently and accurately is essential, which leads to the rise of program synthesis. However, the complexity of program spaces and the diversity of user intents restrict the scale and quality of the code generated, resulting in many works focused on generating function-level code. To address the automated generation of complex code, we propose an automated program construction process model CIPAC based on collective intelligence, which can generate software-level code automatically. CIPAC incorporates methods for the automated aggregation of collective intelligence, the construction of software structures and task specifications, efficient program generation and search, and the optimization of code quality and composition. CIPAC employs explainable program synthesis methods as its core to ensure reliability and leverages collective intelligence throughout the entire software development lifecycle. To validate the effectiveness of CIPAC, we conduct a case study for developing a matrix operation application and explore complex tasks in the aerospace domain. The results show that our platform enables the construction of the software project compared to existing methods. To demonstrate reliability, we conduct integration testing, system testing, and security validation, with results indicating that the generated project passes all tests without security issues.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112335"},"PeriodicalIF":3.7,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143104065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Best practices for evaluating IRFL approaches","authors":"Thomas Hirsch, Birgit Hofer","doi":"10.1016/j.jss.2025.112342","DOIUrl":"10.1016/j.jss.2025.112342","url":null,"abstract":"<div><div>Information retrieval fault localization (IRFL) is a popular research field and many IRFL approaches have been proposed recently. Unfortunately, the evaluation of some of these IRFL approaches is often too simplistic, which can cause an overestimation of performance of these approaches. In this paper, we discuss evaluation pitfalls and problems. Furthermore, we propose best practices to avoid them. In detail, we discuss evaluation strategies such as parameter tuning and temporal dependencies in the data, dataset issues, metrics, statistical significance testing, and the unavailability of supplemental material. To support our claim of the poor status quo of current evaluation practices in some research papers, we have performed a literature survey on 135 papers. We hope that this paper will help researchers to avoid the described pitfalls in their evaluation of IRFL approaches.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112342"},"PeriodicalIF":3.7,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}