Xavier Devroey, Gilles Perrouin, Mike Papadakis, Axel Legay, Pierre-Yves Schobbens, P. Heymans
{"title":"Featured Model-Based Mutation Analysis","authors":"Xavier Devroey, Gilles Perrouin, Mike Papadakis, Axel Legay, Pierre-Yves Schobbens, P. Heymans","doi":"10.1145/2884781.2884821","DOIUrl":"https://doi.org/10.1145/2884781.2884821","url":null,"abstract":"Model-based mutation analysis is a powerful but expensive testing technique. We tackle its high computation cost by proposing an optimization technique that drastically speeds up the mutant execution process. Central to this approach is the Featured Mutant Model, a modelling framework for mutation analysis inspired by the software product line paradigm. It uses behavioural variability models, viz., Featured Transition Systems, which enable the optimized generation, configuration and execution of mutants. We provide results, based on models with thousands of transitions, suggesting that our technique is fast and scalable. We found that it outperforms previous approaches by several orders of magnitude and that it makes higher-order mutation practically applicable.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"203 1","pages":"655-666"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73961919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lu Xiao, Yuanfang Cai, R. Kazman, Ran Mo, Qiong Feng
{"title":"Identifying and Quantifying Architectural Debt","authors":"Lu Xiao, Yuanfang Cai, R. Kazman, Ran Mo, Qiong Feng","doi":"10.1145/2884781.2884822","DOIUrl":"https://doi.org/10.1145/2884781.2884822","url":null,"abstract":"Our prior work showed that the majority of error-prone source files in a software system are architecturally connected. Flawed architectural relations propagate defectsamong these files and accumulate high maintenance costs over time, just like debts accumulate interest. We model groups of architecturally connected files that accumulate high maintenance costs as architectural debts. To quantify such debts, we formally define architectural debt, and show how to automatically identify debts, quantify their maintenance costs, and model these costs over time. We describe a novel history coupling probability matrix for this purpose, and identify architecture debts using 4 patterns of architectural flaws shown to correlate with reduced software quality. We evaluate our approach on 7 large-scale open source projects, and show that a significant portion of total project maintenance effort is consumed by paying interest on architectural debts. The top 5 architectural debts, covering a small portion (8% to 25%) of each project's error-prone files, capture a significant portion (20% to 61%) of each project's maintenance effort. Finally, we show that our approach reveals how architectural issues evolve into debts over time.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"198200 1","pages":"488-498"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74890875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Shadow of a Doubt: Testing for Divergences between Software Versions","authors":"H. Palikareva, Tomasz Kuchta, Cristian Cadar","doi":"10.1145/2884781.2884845","DOIUrl":"https://doi.org/10.1145/2884781.2884845","url":null,"abstract":"While developers are aware of the importance of comprehensively testing patches, the large effort involved in coming up with relevant test cases means that such testing rarely happens in practice. Furthermore, even when test cases are written to cover the patch, they often exercise the same behaviour in the old and the new version of the code. In this paper, we present a symbolic execution-based technique that is designed to generate test inputs that cover the new program behaviours introduced by a patch. The technique works by executing both the old and the new version in the same symbolic execution instance, with the old version shadowing the new one. During this combined shadow execution, whenever a branch point is reached where the old and the new version diverge, we generate a test case exercising the divergence and comprehensively test the new behaviours of the new version. We evaluate our technique on the Coreutils patches from the CoREBench suite of regression bugs, and show that it is able to generate test inputs that exercise newly added behaviours and expose some of the regression bugs.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"27 1","pages":"1181-1192"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82945948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nariman Mirzaei, Joshua Garcia, H. Bagheri, Alireza Sadeghi, S. Malek
{"title":"Reducing Combinatorics in GUI Testing of Android Applications","authors":"Nariman Mirzaei, Joshua Garcia, H. Bagheri, Alireza Sadeghi, S. Malek","doi":"10.1145/2884781.2884853","DOIUrl":"https://doi.org/10.1145/2884781.2884853","url":null,"abstract":"The rising popularity of Android and the GUI-driven nature of its apps have motivated the need for applicable automated GUI testing techniques. Although exhaustive testing of all possible combinations is the ideal upper bound in combinatorial testing, it is often infeasible, due to the combinatorial explosion of test cases. This paper presents TrimDroid, a framework for GUI testing of Android apps that uses a novel strategy to generate tests in a combinatorial, yet scalable, fashion. It is backed with automated program analysis and formally rigorous test generation engines. TrimDroid relies on program analysis to extract formal specifications. These specifications express the app’s behavior (i.e., control flow between the various app screens) as well as the GUI elements and their dependencies. The dependencies among the GUI elements comprising the app are used to reduce the number of combinations with the help of a solver. Our experiments have corroborated TrimDroid’s ability to achieve a comparable coverage as that possible under exhaustive GUI testing using significantly fewer test cases.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"35 1","pages":"559-570"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87160160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generating Performance Distributions via Probabilistic Symbolic Execution","authors":"Bihuan Chen, Yang Liu, Wei Le","doi":"10.1145/2884781.2884794","DOIUrl":"https://doi.org/10.1145/2884781.2884794","url":null,"abstract":"Analyzing performance and understanding the potential best-case, worst-case and distribution of program execution times are very important software engineering tasks. There have been model-based and program analysis-based approaches for performance analysis. Model-based approaches rely on analytical or design models derived from mathematical theories or software architecture abstraction, which are typically coarse-grained and could be imprecise. Program analysis-based approaches collect program profiles to identify performance bottlenecks, which often fail to capture the overall program performance. In this paper, we propose a performance analysis framework PerfPlotter. It takes the program source code and usage profile as inputs and generates a performance distribution that captures the input probability distribution over execution times for the program. It heuristically explores high-probability and low-probability paths through probabilistic symbolic execution. Once a path is explored, it generates and runs a set of test inputs to model the performance of the path. Finally, it constructs the performance distribution for the program. We have implemented PerfPlotter based on the Symbolic PathFinder infrastructure, and experimentally demonstrated that PerfPlotter could accurately capture the best-case, worst-case and distribution of program execution times. We also show that performance distributions can be applied to various important tasks such as performance understanding, bug validation, and algorithm selection.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"17 1","pages":"49-60"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84413225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis","authors":"Sergey Mechtaev, Jooyong Yi, Abhik Roychoudhury","doi":"10.1145/2884781.2884807","DOIUrl":"https://doi.org/10.1145/2884781.2884807","url":null,"abstract":"Since debugging is a time-consuming activity, automated program repair tools such as GenProg have garnered interest. A recent study revealed that the majority of GenProg repairs avoid bugs simply by deleting functionality. We found that SPR, a state-of-the-art repair tool proposed in 2015, still deletes functionality in their many \"plausible\" repairs. Unlike generate-and-validate systems such as GenProg and SPR, semantic analysis based repair techniques synthesize a repair based on semantic information of the program. While such semantics-based repair methods show promise in terms of quality of generated repairs, their scalability has been a concern so far. In this paper, we present Angelix, a novel semantics-based repair method that scales up to programs of similar size as are handled by search-based repair tools such as GenProg and SPR. This shows that Angelix is more scalable than previously proposed semantics based repair methods such as SemFix and DirectFix. Furthermore, our repair method can repair multiple buggy locations that are dependent on each other. Such repairs are hard to achieve using SPR and GenProg. In our experiments, Angelix generated repairs from large-scale real-world software such as wireshark and php, and these generated repairs include multi-location repairs. We also report our experience in automatically repairing the well-known Heartbleed vulnerability.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"56 1","pages":"691-701"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88362982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finding and Analyzing Compiler Warning Defects","authors":"Chengnian Sun, Vu Le, Z. Su","doi":"10.1145/2884781.2884879","DOIUrl":"https://doi.org/10.1145/2884781.2884879","url":null,"abstract":"Good compiler diagnostic warnings facilitate software development as they indicate likely programming mistakes or code smells. However, due to compiler bugs, the warnings may be erroneous, superfluous or missing, even for mature production compilers like GCC and Clang. In this paper, we (1) propose the first randomized differential testing technique to detect compiler warning defects and (2) describe our extensive evaluation in finding warning defects in widely-used C compilers.At the high level, our technique starts with generating random programs to trigger compilers to emit a variety of compiler warnings, aligns the warnings from different compilers, and identifies inconsistencies as potential bugs. We develop effective techniques to overcome three specific challenges: (1) How to generate random programs, (2) how to align textual warnings, and (3) how to reduce test programs for bug reporting?Our technique is very effective — we have found and reported 60 bugs for GCC (38 confirmed, assigned or fixed) and 39 for Clang (14 confirmed or fixed). This case study not only demonstrates our technique’s effectiveness, but also highlights the need to continue improving compilers’ warning support, an essential, but rather neglected aspect of compilers.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"20 1","pages":"203-213"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84311513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Analysis of the Search Spaces for Generate and Validate Patch Generation Systems","authors":"Fan Long, M. Rinard","doi":"10.1145/2884781.2884872","DOIUrl":"https://doi.org/10.1145/2884781.2884872","url":null,"abstract":"We present the first systematic analysis of key characteristics of patch search spaces for automatic patch generation systems. We analyze sixteen different configurations of the patch search spaces of SPR and Prophet, two cur- rent state-of-the-art patch generation systems. The analysis shows that 1) correct patches are sparse in the search spaces (typically at most one correct patch per search space per defect), 2) incorrect patches that nevertheless pass all of the test cases in the validation test suite are typically orders of magnitude more abundant, and 3) leveraging information other than the test suite is therefore critical for enabling the system to successfully isolate correct patches.We also characterize a key tradeoff in the structure of the search spaces. Larger and richer search spaces that contain correct patches for more defects can actually cause systems to find fewer, not more, correct patches. We identify two reasons for this phenomenon: 1) increased validation times because of the presence of more candidate patches and 2) more incorrect patches that pass the test suite and block the discovery of correct patches. These fundamental properties, which are all characterized for the first time in this paper, help explain why past systems often fail to generate correct patches and help identify challenges, opportunities, and productive future directions for the field.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"535 1","pages":"702-713"},"PeriodicalIF":0.0,"publicationDate":"2016-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77114653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Flávio M. Medeiros, Christian Kästner, Márcio Ribeiro, Rohit Gheyi, S. Apel
{"title":"A Comparison of 10 Sampling Algorithms for Configurable Systems","authors":"Flávio M. Medeiros, Christian Kästner, Márcio Ribeiro, Rohit Gheyi, S. Apel","doi":"10.1145/2884781.2884793","DOIUrl":"https://doi.org/10.1145/2884781.2884793","url":null,"abstract":"Almost every software system provides configuration options to tailor the system to the target platform and application scenario. Often, this configurability renders the analysis of every individual system configuration infeasible. To address this problem, researchers have proposed a diverse set of sampling algorithms. We present a comparative study of 10 state-of-the-art sampling algorithms regarding their fault-detection capability and size of sample sets. The former is important to improve software quality and the latter to reduce the time of analysis. In a nutshell, we found that sampling algorithms with larger sample sets are able to detect higher numbers of faults, but simple algorithms with small sample sets, such as most-enabled-disabled, are the most efficient in most contexts. Furthermore, we observed that the limiting assumptions made in previous work influence the number of detected faults, the size of sample sets, and the ranking of algorithms. Finally, we have identified a number of technical challenges when trying to avoid the limiting assumptions, which questions the practicality of certain sampling algorithms.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"115 1","pages":"643-654"},"PeriodicalIF":0.0,"publicationDate":"2016-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80299833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DoubleTake: Fast and Precise Error Detection via Evidence-Based Dynamic Analysis","authors":"Tongping Liu, Charlie Curtsinger, E. Berger","doi":"10.1145/2884781.2884784","DOIUrl":"https://doi.org/10.1145/2884781.2884784","url":null,"abstract":"Programs written in unsafe languages like C and C++ often suffer from errors like buffer overflows, dangling pointers, and memory leaks. Dynamic analysis tools like Valgrind can detect these errors, but their overhead — primarily due to the cost of instrumenting every memory read and write — makes them too heavyweight for use in deployed applications and makes testing with them painfully slow. The result is that much deployed software remains susceptible to these bugs, which are notoriously difficult to track down.This paper presents evidence-based dynamic analysis, an approach that enables these analyses while imposing minimal overhead (under 5%), making it practical for the first time to perform these analyses in deployed settings. The key insight of evidence-based dynamic analysis is that for a class of errors, it is possible to ensure that evidence that they happened at some point in the past remains for later detection. Evidence-based dynamic analysis allows execution to proceed at nearly full speed until the end of an epoch (e.g., a heavyweight system call). It then examines program state to check for evidence that an error occurred at some time during that epoch. If so, it rolls back execution and re-executes the code with instrumentation activated to pinpoint the error.We present DoubleTake, a prototype evidence-based dynamic analysis framework. DoubleTake is practical and easy to deploy, requiring neither custom hardware, compiler, nor operating system support. We demonstrate DoubleTake’s generality and efficiency by building dynamic analyses that find buffer overflows, memory use-after-free errors, and memory leaks. Our evaluation shows that DoubleTake is efficient, imposing under 5% overhead on average, making it the fastest such system to date. It is also precise: DoubleTake pinpoints the location of these errors to the exact line and memory addresses where they occur, providing valuable debugging information to programmers.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"3 1","pages":"911-922"},"PeriodicalIF":0.0,"publicationDate":"2016-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78850022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}