{"title":"Integrating pattern matching and abstract interpretation for verifying cautions of microcontrollers","authors":"Thuy Nguyen, Takashi Tomita, Junpei Endo, Toshiaki Aoki","doi":"10.1002/stvr.1788","DOIUrl":"https://doi.org/10.1002/stvr.1788","url":null,"abstract":"Handling hardware‐dependent properties at a low level is usually required in developing microcontroller‐based applications. One of these hardware‐dependent properties is cautions, which are described in microcontrollers hardware manuals. The process of verifying these cautions is performed manually, as there is currently no single tool that can directly handle this task. This research aims at automating the verification of these cautions. To obtain the typical cautions of microcontrollers, we investigate two sections which have a considerable number of required cautions in the hardware manual of a popular microcontroller. Subsequently, we analyse these cautions and categorize them into several groups. Based on this analysis, we propose a semi‐automatic approach for verifying the cautions which integrates two static programme analysis techniques (i.e., pattern matching and abstract interpretation). To evaluate our approach, we conducted experiments with generated source code, benchmark source code, and industrial source code. The generated source code, which was created automatically based on several aspects of the C programme, was used to evaluate the performance of the approach based on these aspects. The benchmark and the industrial source code, which were provided by Aisin Software Co., Ltd., were used to assess the feasibility and applicability of the approach. The results show that all expected violations in the benchmark source code were detected. Unexpected but real violations in the benchmark programme were also detected. For the industrial source code, the approach successfully handled and detected most of the expected violations. These results show that the approach is promising in verifying the cautions.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"44 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85489349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Assessing test suites of extended finite state machines against model‐ and code‐based faults","authors":"K. El-Fakih, A. Alzaatreh, Uraz Cengiz Türker","doi":"10.1002/stvr.1789","DOIUrl":"https://doi.org/10.1002/stvr.1789","url":null,"abstract":"Tests can be derived from extended finite state machine (EFSM) specifications considering the coverage of single‐transfer faults, all transitions using a transition tour, all‐uses, edge‐pair, and prime path with side trip. We provide novel empirical assessments of the effectiveness of these test suites. The first assessment determines for each pair of test suites if there is a difference between the pair in covering EFSM faults of six EFSM specifications. If the difference is found significant, we determine which test suite outperforms the other. The second assessment is similar to the first; yet, it is carried out against code faults of 12 Java implementations of the specifications. Besides, two assessments are provided to determine whether test suites have better coverage of certain classes of EFSM (or code) faults than others. The evaluation uses proper data transformation of mutation scores and p‐value adjustments for controlling Type I error due to multiple tests. Furthermore, we show that subsuming mutants have an impact on mutation scores of both EFSM and code faults; and accordingly, we use a score that removes them in order not to invalidate the obtained results. The assessments show that all‐uses tests were outperformed by all other tests; transition tours outperformed both edge‐pair and prime path with side trips; and single‐transfer fault tests outperformed all other test suites. Similar results are obtained over the considered EFSM and code fault domains, and there were no significant differences between the test suites coverage of different classes of EFSM and code faults.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"22 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77635654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Editorial: Verification, reliability and performance","authors":"R. Hierons, Tao Xie","doi":"10.1002/stvr.1790","DOIUrl":"https://doi.org/10.1002/stvr.1790","url":null,"abstract":"This issue includes three papers, covering software verification, software reliability modelling and performance assessment, respectively. The first paper, ‘Verification algebra for multi-tenant applications in VaaS architecture’, by Kai Hu, Ji Wan, Kan Luo, Yuzhuang Xu, Zijing Cheng and Wei-Tek Tsai, concerns verification in multi-tenant architectures. Multi-tenant architectures support composition of services and so the rapid development of applications. The issue addressed is the potentially massive number of possible applications formed by composing a given set of services. The authors propose a verification algebra that can determine the results of verification of new combinations of property/application on the basis of different combinations of services already verified and/or the verification of different, but related, properties. The overall approach was evaluated through simulations. (Recommended by Professor Paul Strooper) The second paper, ‘Entropy based enhanced particle swarm optimization on multi-objective software reliability modelling for optimal testing resources allocation’, by Pooja Rani and G. S. Mahapatra, concerns the optimum resource allocation problem to obtain the maximum reliability and minimum total cost under the testing effort constraint. The authors formulate a multi-objective software reliability model of testing resources for a new generalized exponential reliability function to characterize dynamic allocation of total expected cost and testing effort. The authors further propose an enhanced particle swarm optimization (EPSO) to maximize software reliability and minimize allocation cost. The authors conduct experiments to demonstrate the potential of the proposed approach to predict software reliability with greater accuracy. (Recommended by Professor Moonzoo Kim) The third paper, ‘Performance assessment based on stochastic differential equation and effort data for edge computing’, by Yoshinobu Tamura and Shigeru Yamada, concerns performance assessment based on the relationship between the cloud and edge services operated by using open-source software. The authors propose a two-dimensional stochastic differential equation model that considers the unique features with uncertainty from big data under the operation of cloud and edge services. The authors analyse actual data to show numerical examples of performance assessments considering the network connectivity as characteristics of cloud and edge services and compare the noise terms of the proposed model for actual data. (Recommended by Professor Min Xie)","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"7 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87148288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Neto, A. Moreira, Genoveva Vargas-Solar, M. A. Musicante
{"title":"TRANSMUT‐Spark: Transformation mutation for Apache Spark","authors":"J. Neto, A. Moreira, Genoveva Vargas-Solar, M. A. Musicante","doi":"10.1002/stvr.1809","DOIUrl":"https://doi.org/10.1002/stvr.1809","url":null,"abstract":"This paper proposes TRANSMUT‐Spark for automating mutation testing of big data processing code within Spark programs. Apache Spark is an engine for big data analytics/processing that hides the inherent complexity of parallel big data programming. Nonetheless, programmers must cleverly combine Spark built‐in functions within programs and guide the engine to use the right data management strategies to exploit the computational resources required by big data processing and avoid substantial production losses. Many programming details in Spark data processing code are prone to false statements that must be correctly and automatically tested. This paper explores the application of mutation testing in Spark programs, a fault‐based testing technique that relies on fault simulation to evaluate and design test sets. The paper introduces TRANSMUT‐Spark for testing Spark programs by automating the most laborious steps of the process and fully executing the mutation testing process. The paper describes how the TRANSMUT‐Spark automates the mutant generation, test execution and adequacy analysis phases of mutation testing. It also discusses the results of experiments to validate the tool and argues its scope and limitations.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"93 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87502544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Editorial: Testing, Debugging, and Defect Prediction","authors":"R. Hierons, Tao Xie","doi":"10.1002/stvr.1775","DOIUrl":"https://doi.org/10.1002/stvr.1775","url":null,"abstract":"This issue includes four papers, covering performance mutation testing, performance regression localization, fault detection and localization, and defect prediction, respectively. The first paper, by Pedro Delgado-Pérez, Ana Belén Sánchez, Sergio Segura and Inmaculada Medina-Bulo, concerns feasibility of applying performance mutation testing (i.e. applying mutation testing to assess performance tests) at the source-code level in general-purpose languages. To successfully apply performance mutation testing, the authors find it necessary to design specific mutation operators and mechanisms to evaluate the outputs. The authors define and evaluate seven new performance mutation operators to model known bug-inducing patterns. The authors report the results of experimental evaluation on open-source C++ programs. (Recommended by Professor Hyunsook Do) The second paper, by Frolin S. Ocariza Jr. and Boyang Zhao, considers the problem of finding the causes of performance regression in software. Here, a performance regression is an increase in response time as a result of changes to the software. The paper describes a design, called ZAM, that automates the process of comparing execution timelines collected from web applications. Such timelines are used as the basis for finding the causes of performance regression. A number of challenges are introduced by the context in which, for example, timing information is typically noisy. The authors report the results of experimental evaluation and also experience in using the approach. (Recommended by Professor T. H. Tse) The third paper, by Rawad Abou Assi, Wes Masri and Chadi Trad, concerns coincidental correctness and its impact on fault detection and localization. The authors consider weak coincidental correctness, in which a faulty statement is executed but this does not lead to an infected state. They also consider strong coincidental correctness, in which the execution of a faulty statement leads to an infected state but does not lead to incorrect output. The authors empirically investigated the effect of coincidental correctness on three classes of technique: spectrum-based fault localization (SBFL), test suite reduction (TSR) and test case prioritization (TCP). Interestingly, there was significant variation with, for example, evidence that coincidental correctness has a greater impact on TSR and TCP than on SBFL. (Recommended by Professor Hyunsook Do) The fourth paper, by Zeinab Eivazpour and Mohammad Reza Keyvanpour, concerns the cost issue when handling the class imbalance problem over the training dataset in software defect prediction. The authors propose the cost-sensitive stacked generalization (CSSG) approach. This approach combines the staking ensemble learning method with cost-sensitive learning, which aims to reduce misclassification costs. In the CSSG approach, the logistic regression classifier and extra randomized trees ensemble method in cost-sensitive learning and cost-insensitive conditions ar","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"6 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90430547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Model checking C++ programs","authors":"Felipe R. Monteiro, M. R. Gadelha, L. Cordeiro","doi":"10.1002/stvr.1793","DOIUrl":"https://doi.org/10.1002/stvr.1793","url":null,"abstract":"In the last three decades, memory safety issues in system programming languages such as C or C++ have been one of the most significant sources of security vulnerabilities. However, there exist only a few attempts with limited success to cope with the complexity of C++ program verification. We describe and evaluate a novel verification approach based on bounded model checking (BMC) and satisfiability modulo theories (SMT) to verify C++ programs. Our verification approach analyses bounded C++ programs by encoding into SMT various sophisticated features that the C++ programming language offers, such as templates, inheritance, polymorphism, exception handling, and the Standard Template Libraries. We formalize these features within our formal verification framework using a decidable fragment of first‐order logic and then show how state‐of‐the‐art SMT solvers can efficiently handle that. We implemented our verification approach on top of ESBMC. We compare ESBMC to LLBMC and DIVINE, which are state‐of‐the‐art verifiers to check C++ programs directly from the LLVM bitcode. Experimental results show that ESBMC can handle a wide range of C++ programs, presenting a higher number of correct verification results. Additionally, ESBMC has been applied to a commercial C++ application in the telecommunication domain and successfully detected arithmetic‐overflow errors, which could potentially lead to security vulnerabilities.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"77 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85732473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effective grey‐box testing with partial FSM models","authors":"Robert Sachtleben, J. Peleska","doi":"10.1002/stvr.1806","DOIUrl":"https://doi.org/10.1002/stvr.1806","url":null,"abstract":"For partial, nondeterministic, finite state machines, a new conformance relation called strong reduction is presented. It complements other existing conformance relations in the sense that the new relation is well suited for model‐based testing of systems whose inputs are enabled or disabled, depending on the actual system state. Examples of such systems are graphical user interfaces and systems with interfaces that can be enabled or disabled in a mechanical way. We present a new test generation algorithm producing complete test suites for strong reduction. The suites are executed according to the grey‐box testing paradigm: it is assumed that the state‐dependent sets of enabled inputs can be identified during test execution, while the implementation states remain hidden, as in black‐box testing. We show that this grey‐box information is exploited by the generation algorithm in such a way that the resulting best‐case test suite size is only linear in the state space size of the reference model. Moreover, examples show that this may lead to significant reductions of test suite size in comparison to true black‐box testing for strong reduction.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"15 13 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81637401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xavier Devroey, Alessio Gambi, Juan P. Galeotti, René Just, Fitsum Meshesha Kifetew, Annibale Panichella, Sebastiano Panichella
{"title":"JUGE: An infrastructure for benchmarking Java unit test generators","authors":"Xavier Devroey, Alessio Gambi, Juan P. Galeotti, René Just, Fitsum Meshesha Kifetew, Annibale Panichella, Sebastiano Panichella","doi":"10.1002/stvr.1838","DOIUrl":"https://doi.org/10.1002/stvr.1838","url":null,"abstract":"Researchers and practitioners have designed and implemented various automated test case generators to support effective software testing. Such generators exist for various languages (e.g., Java, C#, or Python) and various platforms (e.g., desktop, web, or mobile applications). The generators exhibit varying effectiveness and efficiency, depending on the testing goals they aim to satisfy (e.g., unit‐testing of libraries versus system‐testing of entire applications) and the underlying techniques they implement. In this context, practitioners need to be able to compare different generators to identify the most suited one for their requirements, while researchers seek to identify future research directions. This can be achieved by systematically executing large‐scale evaluations of different generators. However, executing such empirical evaluations is not trivial and requires substantial effort to select appropriate benchmarks, setup the evaluation infrastructure, and collect and analyse the results. In this Software Note, we present our JUnit Generation Benchmarking Infrastructure (JUGE) supporting generators (search‐based, random‐based, symbolic execution, etc.) seeking to automate the production of unit tests for various purposes (validation, regression testing, fault localization, etc.). The primary goal is to reduce the overall benchmarking effort, ease the comparison of several generators, and enhance the knowledge transfer between academia and industry by standardizing the evaluation and comparison process. Since 2013, several editions of a unit testing tool competition, co‐located with the Search‐Based Software Testing Workshop, have taken place where JUGE was used and evolved. As a result, an increasing amount of tools (over 10) from academia and industry have been evaluated on JUGE, matured over the years, and allowed the identification of future research directions. Based on the experience gained from the competitions, we discuss the expected impact of JUGE in improving the knowledge transfer on tools and approaches for test generation between academia and industry. Indeed, the JUGE infrastructure demonstrated an implementation design that is flexible enough to enable the integration of additional unit test generation tools, which is practical for developers and allows researchers to experiment with new and advanced unit testing tools and approaches.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"120 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75786178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An ensemble‐based predictive mutation testing approach that considers impact of unreached mutants","authors":"Alireza Aghamohammadi, S. Mirian-Hosseinabadi","doi":"10.1002/stvr.1784","DOIUrl":"https://doi.org/10.1002/stvr.1784","url":null,"abstract":"Predictive mutation testing (PMT) is a technique to predict whether a mutant is killed, using machine learning approaches. Researchers have proposed various methods for PMT over the years. However, the impact of unreached mutants on PMT is not fully addressed. A mutant is unreached if the statement on which the mutant is generated is not executed by any test cases. We aim at showing that unreached mutants can inflate PMT results. Moreover, we propose an alternative approach to PMT, suggesting a different interpretation for PMT. To this end, we replicated the previous PMT research. We empirically evaluated the suggested approach on 654 Java projects provided by prior literature. Our results indicate that the performance of PMT drastically decreases in terms of area under a receiver operating characteristic curve (AUC) from 0.833 to 0.517. Furthermore, PMT performs worse than random guesses on 27% of the projects. The proposed approach improves the PMT results, achieving the average AUC value of 0.613. As a result, we recommend researchers to remove unreached mutants when reporting the results.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"75 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86146196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The IEEE 12th International Conference on Software Testing, Verification & Validation","authors":"A. Memon, Myra B. Cohen","doi":"10.1002/stvr.1773","DOIUrl":"https://doi.org/10.1002/stvr.1773","url":null,"abstract":"The IEEE 12th International Conference on Software Testing, Verification & Validation (ICST 2019) was held in Xi’an, China. The aim of the ICST conference is to bring together researchers and practitioners who study the theory, techniques, technologies, and applications that concern all aspects of software testing, verification, and validation of software systems. The program committee rigorously reviewed 110 full papers using a double-blind reviewing policy. Each paper received at least three regular reviews and went through a discussion phase where the reviewers made final decisions on each paper, each discussion being led by a meta-reviewer. Out of this process, the committee selected 31 full-length papers that appeared in the conference. These were presented over nine sessions ranging from classical topics such as test generation and test coverage to emerging topics such as machine learning and security during the main conference track. Based on the original reviewers’ feedback, we selected five papers for consideration for this special issue of STVR. These papers were extended from their conference version by the authors and were reviewed according to the standard STVR reviewing process. We thank all the ICST and STVR reviewers for their hardwork. Three papers successfully completed the reviewprocess and are contained in this special issue. The rest of this editorial provides a brief overview of these three papers. The first paper, Automated Visual Classification of DOM-based Presentation Failure Reports for Responsive Web Pages, by Ibrahim Althomali, Gregory Kapfhammer, and Phil McMinn, introduces VERVE, a tool that automatically classifies all hard to detect response layout failures (RLFs) in web applications. An empirical study reveals that VERVE’s classification of all five types of RLFs frequently agrees with classifications produced manually by humans. The second paper, BugsJS: A Benchmark and Taxonomy of JavaScript Bugs, by Péter Gyimesi, Béla Vancsics, Andrea Stocco, Davood Mazinanian, Árpád Beszédes, Rudolf Ferenc, and Ali Mesbah, presents, BugsJS, a benchmark of 453 real, manually validated JavaScript bugs from 10 popular JavaScript server-side programs, comprising 444 k LOC in total. Each bug is accompanied by its bug report, the test cases that expose it, as well as the patch that fixes it. BugJS can help facilitate reproducible empirical studies and comparisons of JavaScript analysis and testing tools. The third paper, Statically Driven Generation of Concurrent Tests for Thread-Safe Classes, by Valerio Terragni andMauro Pezzè presentsDEPCON, a novel approach that reduces the search space ofconcurrent tests by leveraging statically computeddependencies amongpublicmethods.DEPCON exploits the intuition that concurrent tests can expose thread-safety violations thatmanifest exceptions or deadlocks, only if they exercise some specific method dependencies. The results show that DEPCON is more effective than state-of-the-art approaches ","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"28 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81871843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}