Zhanqi Cui , Haochen Jin , Xiang Chen , Rongcun Wang , Xiulei Liu
{"title":"DPFuzz: A fuzz testing tool based on the guidance of defect prediction","authors":"Zhanqi Cui , Haochen Jin , Xiang Chen , Rongcun Wang , Xiulei Liu","doi":"10.1016/j.scico.2024.103170","DOIUrl":"10.1016/j.scico.2024.103170","url":null,"abstract":"<div><p>Fuzz testing is an automated testing technique that is recognized for its efficiency and scalability. Despite its advantages, the growing complexity and scale of software has made testing software adequately increasingly challenging. If fuzz testing can prioritize resources for modules with higher defect proneness, it can effectively enhance its defect detection performance. In this paper, we introduce DPFuzz, a tool for prioritizing the resource allocation of fuzz testing. DPFuzz guides fuzz testing by calculating the fitness score, which is based on the coverage of modules with different defect proneness. DPFuzz also demonstrates the practicability of using defect prediction in software quality assurance and has confirmed its excellent defect detection performance through experiments.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"238 ","pages":"Article 103170"},"PeriodicalIF":1.5,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141696570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-objective differential evolution in the generation of adversarial examples","authors":"Antony Bartlett, Cynthia C.S. Liem, Annibale Panichella","doi":"10.1016/j.scico.2024.103169","DOIUrl":"10.1016/j.scico.2024.103169","url":null,"abstract":"<div><p>Adversarial examples remain a critical concern for the robustness of deep learning models, showcasing vulnerabilities to subtle input manipulations. While earlier research focused on generating such examples using white-box strategies, later research focused on gradient-based black-box strategies, as models' internals often are not accessible to external attackers. This paper extends our prior work by exploring a gradient-free search-based algorithm for adversarial example generation, with particular emphasis on differential evolution (DE). Building on top of the classic DE operators, we propose five variants of gradient-free algorithms: a single-objective approach (<figure><img></figure>), two multi-objective variations (<figure><img></figure> and <figure><img></figure>), and two many-objective strategies (<figure><img></figure> and <figure><img></figure>). Our study on five canonical image classification models shows that whilst <figure><img></figure> variant remains the fastest approach, <figure><img></figure> consistently produces more minimal adversarial attacks (i.e., with fewer image perturbations). Moreover, we found that applying a post-process minimization to our adversarial images, would further reduce the number of changes and overall delta variation (image noise).</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"238 ","pages":"Article 103169"},"PeriodicalIF":1.5,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167642324000923/pdfft?md5=0868cc1132d7cb3394667dc10d9262c7&pid=1-s2.0-S0167642324000923-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141639288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CRAG – a combinatorial testing-based generator of road geometries for ADS testing","authors":"Paolo Arcaini , Ahmet Cetinkaya","doi":"10.1016/j.scico.2024.103171","DOIUrl":"https://doi.org/10.1016/j.scico.2024.103171","url":null,"abstract":"<div><p>Simulation-based testing of autonomous driving systems (ADS) consists in finding scenarios in which the ADS misbehaves, e.g., it leads the car to drive off the road. The road geometry is an important feature of the scenario, as it has a direct impact on the ADS, e.g., its ability to keep the car inside the driving lane. In this paper, we present <span>CRAG</span>, a road generator for ADS testing. <span>CRAG</span> uses combinatorial testing to explore high level road configurations, and search for finding concrete road geometries in these configurations. <span>CRAG</span> has been designed in a way that it can be easily extended in terms of generator of combinatorial test suites, search algorithms, and test goals.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"238 ","pages":"Article 103171"},"PeriodicalIF":1.5,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141607190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Simona Prokić, Nikola Luburić, Jelena Slivka, Aleksandar Kovačević
{"title":"Prescriptive procedure for manual code smell annotation","authors":"Simona Prokić, Nikola Luburić, Jelena Slivka, Aleksandar Kovačević","doi":"10.1016/j.scico.2024.103168","DOIUrl":"https://doi.org/10.1016/j.scico.2024.103168","url":null,"abstract":"<div><p>– Code smells are structures in code that present potential software maintainability issues. Manually constructing high-quality datasets to train ML models for code smell detection is challenging. Inconsistent annotations, small size, non-realistic smell-to-non-smell ratio, and poor smell coverage hinder the dataset quality. These issues arise mainly due to the time-consuming nature of manual annotation and annotators’ disagreements caused by ambiguous and vague smell definitions.</p><p>To address challenges related to building high-quality datasets suitable for training ML models for smell detection, we designed a prescriptive procedure for manual code smell annotation. The proposed procedure represents an extension of our previous work, aiming to support the annotation of any smell defined by Fowler. We validated the procedure by employing three annotators to annotate smells following the proposed annotation procedure.</p><p>The main contribution of this paper is a prescriptive annotation procedure that benefits the following stakeholders: annotators building high-quality smell datasets that can be used to train ML models, ML researchers building ML models for smell detection, and software engineers employing ML models to enhance the software maintainability. Secondary contributions are the code smell dataset containing Data Class, Feature Envy, and Refused Bequest, and DataSet Explorer tool which supports annotators during the annotation procedure.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"238 ","pages":"Article 103168"},"PeriodicalIF":1.5,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141541063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GraphPyRec: A novel graph-based approach for fine-grained Python code recommendation","authors":"Xing Zong, Shang Zheng, Haitao Zou, Hualong Yu, Shang Gao","doi":"10.1016/j.scico.2024.103166","DOIUrl":"https://doi.org/10.1016/j.scico.2024.103166","url":null,"abstract":"<div><p>Artificial intelligence has been widely applied in software engineering areas such as code recommendation. Significant progress has been made in code recommendation for static languages in recent years, but it remains challenging for dynamic languages like Python as accurately determining data flows before runtime is difficult. This limitation hinders data flow analysis, affecting the performance of code recommendation methods that rely on code analysis. In this study, a graph-based Python recommendation approach (GraphPyRec) is proposed by converting source code into a graph representation that captures both semantic and dynamic information. Nodes represent semantic information, with unique rules defined for various code statements. Edges depict control flow and data flow, utilizing a child-sibling-like process and a dedicated algorithm for data transfer extraction. Alongside the graph, a bag of words is created to include essential names, and a pre-trained BERT model transforms it into vectors. These vectors are integrated into a Gated Graph Neural Network (GGNN) process of the code recommendation model, enhancing its effectiveness and accuracy. To validate the proposed method, we crawled over a million lines of code from GitHub. Experimental results show that GraphPyRec outperforms existing mainstream Python code recommendation methods, achieving Top-1, 5, and 10 accuracy rates of 68.52%, 88.92%, and 94.05%, respectively, along with a Mean Reciprocal Rank (MRR) of 0.772.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"238 ","pages":"Article 103166"},"PeriodicalIF":1.5,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141487280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Special Issue on Selected Tools from the Tool Track of the 30th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2023 Tool Track)","authors":"Ying Wang , Tao Zhang , Xiapu Luo , Peng Liang","doi":"10.1016/j.scico.2024.103167","DOIUrl":"10.1016/j.scico.2024.103167","url":null,"abstract":"","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"238 ","pages":"Article 103167"},"PeriodicalIF":1.5,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141394820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"libmg: A Python library for programming graph neural networks in μG","authors":"Matteo Belenchia, Flavio Corradini, Michela Quadrini, Michele Loreti","doi":"10.1016/j.scico.2024.103165","DOIUrl":"10.1016/j.scico.2024.103165","url":null,"abstract":"<div><p>Graph neural networks have proven their effectiveness across a wide spectrum of graph-based tasks. Despite their successes, they share the same limitations as other deep learning architectures and pose additional challenges for their formal verification. To overcome these problems, we proposed a specification language, <span><math><mi>μ</mi><mi>G</mi></math></span>, that can be used to <em>program</em> graph neural networks. This language has been implemented in a Python library called <span>libmg</span> that handles the definition, compilation, visualization, and explanation of <span><math><mi>μ</mi><mi>G</mi></math></span> graph neural network models. We illustrate its usage by showing how it was used to implement a Computation Tree Logic model checker in our previous work, and evaluate its performance on the benchmarks of the Model Checking Contest. In the future, we plan to use <span><math><mi>μ</mi><mi>G</mi></math></span> to further investigate the issues of explainability and verification of graph neural networks.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"238 ","pages":"Article 103165"},"PeriodicalIF":1.3,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141398951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards a framework for reliable performance evaluation in defect prediction","authors":"Xutong Liu, Shiran Liu, Zhaoqiang Guo, Peng Zhang, Yibiao Yang, Huihui Liu, Hongmin Lu, Yanhui Li, Lin Chen, Yuming Zhou","doi":"10.1016/j.scico.2024.103164","DOIUrl":"10.1016/j.scico.2024.103164","url":null,"abstract":"<div><p>Enhancing software reliability, dependability, and security requires effective identification and mitigation of defects during early development stages. Software defect prediction (SDP) models have emerged as valuable tools for this purpose. However, there is currently a lack of consensus in evaluating the predictive performance of newly proposed models, which hinders accurate measurement of progress and can lead to misleading conclusions. To tackle this challenge, we present MATTER (a fraMework towArd a consisTenT pErformance compaRison), which aims to provide reliable and consistent performance comparisons for SDP models. MATTER incorporates three key considerations. First, it establishes a global reference point, ONE (glObal baseliNe modEl), which possesses the 3S properties (Simplicity in implementation, Strong predictive ability, and Stable prediction performance), to serve as the baseline for evaluating other models. Second, it proposes using the SQA-effort-aligned threshold setting to ensure fair performance comparisons. Third, it advocates for consistent performance evaluation by adopting a set of core performance indicators that reflect the practical value of prediction models in achieving tangible progress. Through the application of MATTER to the same benchmark data sets, researchers and practitioners can obtain more accurate and meaningful insights into the performance of defect prediction models, thereby facilitating informed decision-making and improving software quality. When evaluating representative SDP models from recent years using MATTER, we surprisingly observed that: none of these models demonstrated a notable enhancement in prediction performance compared to the simple baseline model ONE. In future studies, we strongly recommend the adoption of MATTER to assess the actual usefulness of newly proposed models, promoting reliable scientific progress in defect prediction.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"238 ","pages":"Article 103164"},"PeriodicalIF":1.3,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141408043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chi Zhang , Jinfu Chen , Saihua Cai , Wen Zhang , Rexford Nii Ayitey Sosu , Haibo Chen
{"title":"TR-Fuzz: A syntax valid tool for fuzzing C compilers","authors":"Chi Zhang , Jinfu Chen , Saihua Cai , Wen Zhang , Rexford Nii Ayitey Sosu , Haibo Chen","doi":"10.1016/j.scico.2024.103155","DOIUrl":"10.1016/j.scico.2024.103155","url":null,"abstract":"<div><p>Compilers play a critical role in current software construction. However, the vulnerabilities or bugs within the compiler can pose significant challenges to ensuring the security of the resultant software. In recent years, many compilers have made use of testing techniques to address and mitigate such concerns. Fuzzing is widely used among these techniques to detect software bugs. However, when fuzzing compilers, there are still shortcomings in terms of the diversity and validity of test cases. This paper introduces TR-Fuzz, a fuzzing tool specifically designed for C compilers based on Transformer. Leveraging position embedding and multi-head attention mechanisms, TR-Fuzz establishes relationships among data, facilitating the generation of well-formed C programs for compiler testing. In addition, we use different generation strategies in the process of program generation to improve the performance of TR-Fuzz. We validate the effectiveness of TR-Fuzz through the comparison with existing fuzzing tools for C compilers. The experimental results show that TR-Fuzz increases the pass rate of the generated C programs by an average of about 12% and improves the coverage of programs under test compared with the existing tools. Benefiting from the improved pass rate and coverage, we found five bugs in GCC-9.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"238 ","pages":"Article 103155"},"PeriodicalIF":1.3,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141405384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tom Lauwaerts , Stefan Marr , Christophe Scholliers
{"title":"Latch: Enabling large-scale automated testing on constrained systems","authors":"Tom Lauwaerts , Stefan Marr , Christophe Scholliers","doi":"10.1016/j.scico.2024.103157","DOIUrl":"10.1016/j.scico.2024.103157","url":null,"abstract":"<div><p>Testing is an essential part of the software development cycle. Unfortunately, testing on constrained devices is currently very challenging. First, the limited memory of constrained devices severely restricts the size of test suites. Second, the limited processing power causes test suites to execute slowly, preventing a fast feedback loop. Third, when the constrained device becomes unresponsive, it is impossible to distinguish between the test failing or taking very long, forcing the developer to work with timeouts. Unfortunately, timeouts can cause tests to be flaky, i.e., have unpredictable outcomes independent of code changes. Given these problems, most IoT developers rely on laborious manual testing.</p><p>In this paper, we propose the novel testing framework <em>Latch</em> (Large-scale Automated Testing on Constrained Hardware) to overcome the three main challenges of running large test suites on constrained hardware, as well as automate manual testing scenarios through a novel testing methodology based on debugger-like operations—we call this new testing approach <em>managed testing</em>.</p><p>The core idea of <em>Latch</em> is to enable testing on constrained devices without those devices maintaining the whole test suite in memory. Therefore, programmers script and run tests on a workstation which then step-wise instructs the constrained device to execute each test, thereby overcoming the memory constraints. Our testing framework further allows developers to mark tests as depending on other tests. This way, <em>Latch</em> can skip tests that depend on previously failing tests resulting in a faster feedback loop. Finally, <em>Latch</em> addresses the issue of timeouts and flaky tests by including an analysis mode that provides feedback on timeouts and the flakiness of tests.</p><p>To illustrate the expressiveness of <em>Latch</em>, we present testing scenarios representing unit testing, integration testing, and end-to-end testing. We evaluate the performance of <em>Latch</em> by testing a virtual machine against the WebAssembly specification, with a large test suite consisting of 10,213 tests running on an ESP32 microcontroller. Our experience shows that the testing framework is expressive, reliable and reasonably fast, making it suitable to run large test suites on constrained devices. Furthermore, the debugger-like operations enable to closely mimic manual testing.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"238 ","pages":"Article 103157"},"PeriodicalIF":1.5,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141414909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}