Xavier Devroey, Alessio Gambi, Juan P. Galeotti, René Just, Fitsum Meshesha Kifetew, Annibale Panichella, Sebastiano Panichella
{"title":"JUGE: An infrastructure for benchmarking Java unit test generators","authors":"Xavier Devroey, Alessio Gambi, Juan P. Galeotti, René Just, Fitsum Meshesha Kifetew, Annibale Panichella, Sebastiano Panichella","doi":"10.1002/stvr.1838","DOIUrl":null,"url":null,"abstract":"Researchers and practitioners have designed and implemented various automated test case generators to support effective software testing. Such generators exist for various languages (e.g., Java, C#, or Python) and various platforms (e.g., desktop, web, or mobile applications). The generators exhibit varying effectiveness and efficiency, depending on the testing goals they aim to satisfy (e.g., unit‐testing of libraries versus system‐testing of entire applications) and the underlying techniques they implement. In this context, practitioners need to be able to compare different generators to identify the most suited one for their requirements, while researchers seek to identify future research directions. This can be achieved by systematically executing large‐scale evaluations of different generators. However, executing such empirical evaluations is not trivial and requires substantial effort to select appropriate benchmarks, setup the evaluation infrastructure, and collect and analyse the results. In this Software Note, we present our JUnit Generation Benchmarking Infrastructure (JUGE) supporting generators (search‐based, random‐based, symbolic execution, etc.) seeking to automate the production of unit tests for various purposes (validation, regression testing, fault localization, etc.). The primary goal is to reduce the overall benchmarking effort, ease the comparison of several generators, and enhance the knowledge transfer between academia and industry by standardizing the evaluation and comparison process. Since 2013, several editions of a unit testing tool competition, co‐located with the Search‐Based Software Testing Workshop, have taken place where JUGE was used and evolved. As a result, an increasing amount of tools (over 10) from academia and industry have been evaluated on JUGE, matured over the years, and allowed the identification of future research directions. Based on the experience gained from the competitions, we discuss the expected impact of JUGE in improving the knowledge transfer on tools and approaches for test generation between academia and industry. Indeed, the JUGE infrastructure demonstrated an implementation design that is flexible enough to enable the integration of additional unit test generation tools, which is practical for developers and allows researchers to experiment with new and advanced unit testing tools and approaches.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"120 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2021-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software Testing Verification & Reliability","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1002/stvr.1838","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 4
Abstract
Researchers and practitioners have designed and implemented various automated test case generators to support effective software testing. Such generators exist for various languages (e.g., Java, C#, or Python) and various platforms (e.g., desktop, web, or mobile applications). The generators exhibit varying effectiveness and efficiency, depending on the testing goals they aim to satisfy (e.g., unit‐testing of libraries versus system‐testing of entire applications) and the underlying techniques they implement. In this context, practitioners need to be able to compare different generators to identify the most suited one for their requirements, while researchers seek to identify future research directions. This can be achieved by systematically executing large‐scale evaluations of different generators. However, executing such empirical evaluations is not trivial and requires substantial effort to select appropriate benchmarks, setup the evaluation infrastructure, and collect and analyse the results. In this Software Note, we present our JUnit Generation Benchmarking Infrastructure (JUGE) supporting generators (search‐based, random‐based, symbolic execution, etc.) seeking to automate the production of unit tests for various purposes (validation, regression testing, fault localization, etc.). The primary goal is to reduce the overall benchmarking effort, ease the comparison of several generators, and enhance the knowledge transfer between academia and industry by standardizing the evaluation and comparison process. Since 2013, several editions of a unit testing tool competition, co‐located with the Search‐Based Software Testing Workshop, have taken place where JUGE was used and evolved. As a result, an increasing amount of tools (over 10) from academia and industry have been evaluated on JUGE, matured over the years, and allowed the identification of future research directions. Based on the experience gained from the competitions, we discuss the expected impact of JUGE in improving the knowledge transfer on tools and approaches for test generation between academia and industry. Indeed, the JUGE infrastructure demonstrated an implementation design that is flexible enough to enable the integration of additional unit test generation tools, which is practical for developers and allows researchers to experiment with new and advanced unit testing tools and approaches.
期刊介绍:
The journal is the premier outlet for research results on the subjects of testing, verification and reliability. Readers will find useful research on issues pertaining to building better software and evaluating it.
The journal is unique in its emphasis on theoretical foundations and applications to real-world software development. The balance of theory, empirical work, and practical applications provide readers with better techniques for testing, verifying and improving the reliability of software.
The journal targets researchers, practitioners, educators and students that have a vested interest in results generated by high-quality testing, verification and reliability modeling and evaluation of software. Topics of special interest include, but are not limited to:
-New criteria for software testing and verification
-Application of existing software testing and verification techniques to new types of software, including web applications, web services, embedded software, aspect-oriented software, and software architectures
-Model based testing
-Formal verification techniques such as model-checking
-Comparison of testing and verification techniques
-Measurement of and metrics for testing, verification and reliability
-Industrial experience with cutting edge techniques
-Descriptions and evaluations of commercial and open-source software testing tools
-Reliability modeling, measurement and application
-Testing and verification of software security
-Automated test data generation
-Process issues and methods
-Non-functional testing