{"title":"Pytester: Deep reinforcement learning for text-to-testcase generation","authors":"Wannita Takerngsaksiri , Rujikorn Charakorn , Chakkrit Tantithamthavorn , Yuan-Fang Li","doi":"10.1016/j.jss.2025.112381","DOIUrl":"10.1016/j.jss.2025.112381","url":null,"abstract":"<div><div>Test-driven development (TDD) is a widely-employed software development practice that mandates writing test cases based on a textual description <em>before</em> writing the actual code. While writing test cases is the centerpiece of TDD, it is time-consuming, expensive, and often shunned by developers. To address these issues associated with TDD, automated test case generation approaches have recently been investigated. Such approaches take source code as input, but not the textual description. Therefore, existing work does not fully support true TDD, as actual code is required to generate test cases. In addition, current deep learning-based test case generation approaches are trained with one learning objective, i.e., to generate test cases that are exactly matched with the ground-truth test cases. However, such approaches may limit the model’s ability to generate different yet correct test cases. In this paper, we introduce <span>PyTester</span>, a Text-to-Testcase generation approach that can automatically generate syntactically correct, executable, complete, and effective test cases while being aligned with a given textual description. We evaluate <span>PyTester</span> on the public APPS benchmark dataset, and the results show that our Deep RL approach enables <span>PyTester</span>, a small language model, to outperform much larger language models like GPT3.5, StarCoder, and InCoder. Our findings suggest that future research could consider improving small over large LMs for better resource efficiency by integrating the SE domain knowledge into the design of reinforcement learning architecture.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"224 ","pages":"Article 112381"},"PeriodicalIF":3.7,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anh Ho , Anh M.T. Bui , Phuong T. Nguyen , Amleto Di Salle , Bach Le
{"title":"EnseSmells : Deep ensemble and programming language models for automated code smells detection","authors":"Anh Ho , Anh M.T. Bui , Phuong T. Nguyen , Amleto Di Salle , Bach Le","doi":"10.1016/j.jss.2025.112375","DOIUrl":"10.1016/j.jss.2025.112375","url":null,"abstract":"<div><div>A smell in software source code denotes an indication of suboptimal design and implementation decisions, potentially hindering the code understanding and, in turn, raising the likelihood of being prone to changes and faults. Identifying these code issues at an early stage in the software development process can mitigate these problems and enhance the overall quality of the software. Current research primarily focuses on the utilization of deep learning-based models to investigate the contextual information concealed within source code instructions to detect code smells, with limited attention given to the importance of structural and design-related features. This paper proposes a novel approach to code smell detection, constructing a deep learning architecture that places importance on the fusion of structural features and statistical semantics derived from pre-trained models for programming languages. We further provide a thorough analysis of how different source code embedding models affect the detection performance with respect to different code smell types. Using four widely-used code smells from well-designed datasets, our empirical study shows that incorporating design-related features significantly improves detection accuracy, outperforming state-of-the-art methods on the MLCQ dataset with improvements ranging from 5.98% to 28.26%, depending on the type of code smell.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"224 ","pages":"Article 112375"},"PeriodicalIF":3.7,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143438142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vladislav Indykov , Daniel Strüber , Rebekka Wohlrab
{"title":"Architectural tactics to achieve quality attributes of machine-learning-enabled systems: a systematic literature review","authors":"Vladislav Indykov , Daniel Strüber , Rebekka Wohlrab","doi":"10.1016/j.jss.2025.112373","DOIUrl":"10.1016/j.jss.2025.112373","url":null,"abstract":"<div><div>Machine-learning-enabled systems are becoming increasingly common in different industries. Due to the impact of uncertainty and the pronounced role of data, ensuring the quality of such systems requires consideration of several unique characteristics in addition to traditional ones. This range of quality attributes can be achieved by the implementation of specific architectural tactics. Such architectural decisions affect the further functioning of the system and its compliance with business goals. Architectural decisions have to be made with attention to possible quality trade-offs to prevent the cost of mitigating unintended side effects. A related work analysis revealed the need for a thorough study of existing architectural decisions and their impact on various quality attributes in the context of machine-learning-enabled systems. In this paper, to address this goal, we present comprehensive research on the quality of such systems, architectural tactics, and their possible quality consequences. Based on a systematic literature review of 206 primary sources, we identified 11 common quality attributes, and 16 relevant architectural tactics together along with 85 potential quality trade-offs. Our results systematize existing research in building architectures of ML-enabled systems. They can be used by software architects and researchers at the system design stage to estimate the possible consequences of decisions made.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"223 ","pages":"Article 112373"},"PeriodicalIF":3.7,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143420254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving the performance of software fault localization with effective coverage data reduction techniques","authors":"Chih-Chiang Fang , Chin-Yu Huang , Shou-Yu Lee , Yao-Hsien Tseng , C.W. Chu","doi":"10.1016/j.jss.2025.112388","DOIUrl":"10.1016/j.jss.2025.112388","url":null,"abstract":"<div><div>Fault localization (FL) techniques are widely used to identify the exact location of faulty statement in programs. Three common FL families are SBFL, MBFL, and deep learning-based FL, respectively. Before running any FL methods, coverage data is usually considered as input of FL stage. Therefore, coverage data plays an important role in FL field. On the other hand, if coverage data can be reduced effectively, the performance of FL will be greatly improved. In past studies, filtering out fault-irrelevant statements based on solely failed test cases, the traditional principal component analysis (PCA), and revised PCA techniques were applied to minimize coverage data. However, these approaches have a great opportunity to remove the actual faulty statement, especially in multiple fault localization (MFL). Tracing their root causes does not reflect the actual status of each statement. In this paper, we propose two approaches to improve the situations of deleted faulty statements. For the first approach, called Revised PCA with Ensemble Weight Integration (RPCA-EWI), it updates the contribution value of each statement based on revised PCA and incorporate the results of different combinations of failed and passed test cases. For the second approach, called Revised PCA with Important List Checking (RPCA-ILC), we establish a list of the top N% important statements by using the results of different test case combinations. If the deleted statement appears within this list, preserve it in reduced coverage data. Otherwise, it discards directly. We selected three Linux open-source codes (Gzip, Grep, and Sed) with 4 fault injections to validate the correctness. From the analysis of various perspectives, experimental results show that there is a significant improvement in shortening execution time of the FL process, and also can alleviate the situations for removed faulty statements compared to PCA and the revised PCA methods.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"226 ","pages":"Article 112388"},"PeriodicalIF":3.7,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143534025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Orvila Sarker , Asangi Jayatilaka , Sherif Haggag , Chelsea Liu , M. Ali Babar
{"title":"Understanding practitioners’ challenges and requirements in the design, implementation, and evaluation of anti-phishing interventions","authors":"Orvila Sarker , Asangi Jayatilaka , Sherif Haggag , Chelsea Liu , M. Ali Babar","doi":"10.1016/j.jss.2025.112356","DOIUrl":"10.1016/j.jss.2025.112356","url":null,"abstract":"<div><h3>Background:</h3><div>Research shows that the ineffectiveness of anti-phishing interventions can result from practitioners’ failure to consider end-users’ requirements in the intervention design, implementation, and evaluation. To assist practitioners in addressing usability issues, we reported 41 guidelines through a systematic Multi-vocal Literature Review (MLR). The usefulness of these guidelines in real-world scenarios remains uncertain until the involved challenges and requirements to implement them are investigated.</div></div><div><h3>Objective:</h3><div>(1) To investigate practitioners’ challenges in the design, implementation, and evaluation of phishing interventions in real-world settings; (2) to understand practitioners’ perspectives on our guidelines and how they can be made easily accessible to the practitioners.</div></div><div><h3>Method:</h3><div>We interviewed 18 practitioners (intervention designers, security practitioners, and C-suite employees) from 18 organizations in 6 countries.</div></div><div><h3>Results:</h3><div>(1) We identify 8 challenges in training content design, anti-phishing datasets, post-training knowledge assessment, and so on. We compare these challenges with the challenges identified from our MLR to demonstrate the ecological validity of the challenges found in MLR and derive a set of insights to overcome them; (2) we report practitioners’ feedback on our guidelines; (3) we gather actionable features on an envisioned tool to make these guidelines easily accessible. Conclusion: We provide 15 recommendations to improve the anti-phishing defense in the organisations.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"225 ","pages":"Article 112356"},"PeriodicalIF":3.7,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143479176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elijah Zolduoarrati, Sherlock A․ Licorish, Nigel Stanger
{"title":"Stack overflow's hidden nuances: How does zip code define user contribution?","authors":"Elijah Zolduoarrati, Sherlock A․ Licorish, Nigel Stanger","doi":"10.1016/j.jss.2025.112374","DOIUrl":"10.1016/j.jss.2025.112374","url":null,"abstract":"<div><div>Online communities like Stack Overflow rely on collective intelligence, with developers often incorporating its code snippets within their software repositories. Despite this relevance, concerns remain regarding issues surrounding user participation within the platform. While these phenomena have been studied in isolation, literature investigating them under a unified lens remain scarce. Our work aims to bridge this gap by operationalising metrics to represent user participation, behaviour, and community value across US states and cities. Our findings show that users from rural states tend to have higher daily posts, more votes, and produce more readable, positive-toned content with fewer typos. Those from urbanised states nonetheless obtain more question favourites, post scores, and accrue more views to both their questions and their profiles. At the city level, users from cities with prominent R&D sectors were found to curate more content and engage more actively, while cities without a strong tech presence show higher disengagements, increased likelihood of lurking, and a tendency to write longer code snippets. Qualitative content analysis triangulates our findings where users from tech hubs favour technical jargon and collaborative knowledge-sharing, their posts buzzing with coding documentations, debugging pointers, and personal anecdotes. In contrast, rural users weave a tapestry of emotions, expressing hope, frustration, and contentment alongside their many questions. Our research uncovers a dynamic interplay between factors influencing user participation, behaviour, and community values. Rather than static dichotomies, these elements exhibit multifaceted influences, suggesting varying impacts from diverse factors like tech access, educational initiatives, and inherent behavioural tendencies.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"223 ","pages":"Article 112374"},"PeriodicalIF":3.7,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143427893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhenzhou Tian , Rui Qiu , Yudong Teng , Jiaze Sun , Yanping Chen , Lingwei Chen
{"title":"Towards cost-efficient vulnerability detection with cross-modal adversarial reprogramming","authors":"Zhenzhou Tian , Rui Qiu , Yudong Teng , Jiaze Sun , Yanping Chen , Lingwei Chen","doi":"10.1016/j.jss.2025.112365","DOIUrl":"10.1016/j.jss.2025.112365","url":null,"abstract":"<div><div>While deep learning has advanced the automatic detection of software vulnerabilities, current DL-based methods still face two major obstacles: the scarcity of vulnerable code samples and the high computational cost of training models from scratch, which, however, have been largely overlooked. This paper introduces <span>Capture</span>, a novel <u>C</u>ross-modal <u>A</u>dversarial re<u>P</u>rogramming approach <u>T</u>owards cost-efficient v<u>U</u>lne<u>R</u>ability d<u>E</u>tection, which reduces the need for well-labeled large vulnerable datasets and minimizes training time. Specifically, <span>Capture</span> first performs lexical parsing and linearization on the AST of the source code to extract structure- and type-aware token sequences. These sequences are transformed into a perturbation image by retrieving and reshaping each token’s embedding from a learnable universal perturbation dictionary. This enables a pre-trained model originally designed for image classification to be repurposed to support code vulnerability detection, with a dynamic label remapping scheme applied at the end that reassigns the model’s output to the binary vulnerability detection result. Our experiments demonstrate that <span>Capture</span> achieves detection accuracy comparable to state-of-the-art methods, while enhancing training efficiency due to its minimal quantity of parameters to update during the model training. Notably, <span>Capture</span> excels in scenarios with limited vulnerable samples, delivering superior detection accuracy and F1 scores compared to baseline methods.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"223 ","pages":"Article 112365"},"PeriodicalIF":3.7,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143402636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A GUI-based Metamorphic Testing Technique for Detecting Authentication Vulnerabilities in Android Mobile Apps","authors":"Domenico Amalfitano , Misael Júnior , Anna Rita Fasolino , Marcio Delamaro","doi":"10.1016/j.jss.2025.112364","DOIUrl":"10.1016/j.jss.2025.112364","url":null,"abstract":"<div><h3>Context:</h3><div>The increasing use of mobile apps in daily life involves managing and sharing sensitive user information.</div></div><div><h3>Problem:</h3><div>New vulnerabilities are frequently reported in bug tracking systems, highlighting the need for effective security testing processes for these applications.</div></div><div><h3>Proposal:</h3><div>This study introduces a GUI-based Metamorphic Testing technique designed to detect five common real-world vulnerabilities related to username and password authentication methods in Android applications, as identified by OWASP.</div></div><div><h3>Methods:</h3><div>We developed five Metamorphic Relationships to test for these vulnerabilities and implemented a Metamorphic Vulnerability Testing Environment to automate the technique. This environment facilitates the generation of <em>Source test case</em> and the automatic creation and execution of <em>Follow-up test case</em>.</div></div><div><h3>Results:</h3><div>The technique was applied to 163 real-world Android applications, uncovering 159 vulnerabilities. Out of these, 108 apps exhibited at least one vulnerability. The vulnerabilities were validated through expert analysis conducted by three security professionals, who confirmed the issues by interacting directly with the app’s graphical user interfaces (GUIs). Additionally, to assess the practical relevance of our approach, we engaged with 37 companies whose applications were identified as vulnerable. Nine companies confirmed the vulnerabilities, and 26 updated their apps to address the reported issues. Our findings also indicate a weak inverse correlation between user-perceived quality and vulnerabilities; even highly rated apps can harbor significant security flaws.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"224 ","pages":"Article 112364"},"PeriodicalIF":3.7,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143438141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Attention-based Wide and Deep Neural Network for Reentrancy Vulnerability Detection in Smart Contracts","authors":"Samuel Banning Osei , Rubing Huang , Zhongchen Ma","doi":"10.1016/j.jss.2025.112361","DOIUrl":"10.1016/j.jss.2025.112361","url":null,"abstract":"<div><div>In recent years, smart contracts have become integral to blockchain applications, offering decentralized, transparent, and tamper-proof execution of agreements. However, vulnerabilities in smart contracts pose significant security risks, leading to financial losses. This paper presents an Attention-based Wide and Deep Neural Network (AWDNN) for Reentrancy vulnerability Detection in Ethereum smart contracts. By emphasizing crucial smart contract features, AWDNN enhances its precision in identifying complex vulnerability patterns. Our approach includes three phases: code optimization, vectorization, and vulnerability detection. We streamline smart contract code by removing extraneous components and extracting key fragments. These fragments are transformed into vectors that capture the smart contract’s semantic features, and subsequently subjected through the wide and deep neural network to detect vulnerabilities. Experimental results show that our model performs well compared to existing tools. Future work aims to detect additional vulnerabilities and incorporate advanced vectorization techniques to enhance efficiency.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"223 ","pages":"Article 112361"},"PeriodicalIF":3.7,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143215069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SBFL fault localization considering fault-proneness","authors":"Reza Torkashvan , Saeed Parsa , Babak Vaziri","doi":"10.1016/j.jss.2025.112363","DOIUrl":"10.1016/j.jss.2025.112363","url":null,"abstract":"<div><div>Fault localization is a critical phase in software debugging, often posing significant challenges and demanding extensive time for large and complex programs. Spectrum-based fault localization (SBFL) is a straightforward and cost-effective technique that leverages program execution logs to identify faulty statements. However, the effectiveness of SBFL can be compromised by biases in the test data set, which may not uniformly cover all code features. This study demonstrates that the integration of fault-proneness scores of program classes, predicted by a machine learning model utilizing source code metrics, with the fault-suspiciousness scores of program statements, estimated by SBFL, can enhance the accuracy and efficacy of fault localization. A Random Forest model is employed to predict the fault-proneness of classes in five Java projects from the Unified-Bug-Dataset 1.2. Concurrently, three established SBFL formulas are used to compute the fault-suspiciousness of statements. Statements are ranked based on their faultiness scores, derived from a linear combination of class fault-proneness and statement fault-suspiciousness. This approach is compared with the original SBFL formulas using four evaluation metrics: F-measure, precision, recall, and accuracy. The results indicate that the proposed method surpasses the original SBFL formulas across all metrics and significantly reduces the search space for fault localization. These findings suggest that the integration of static and dynamic analysis provides a more reliable and efficient method for fault localization in software systems.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"223 ","pages":"Article 112363"},"PeriodicalIF":3.7,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143420255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}