{"title":"Charting a Course Through Uncertain Environments: SEA Uses Past Problems to Avoid Future Failures","authors":"P. Moore, Justin Cappos, P. Frankl, Thomas Wies","doi":"10.1109/ISSRE.2019.00011","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00011","url":null,"abstract":"A common problem for developers is applications exhibiting new bugs after deployment. Many of these bugs can be traced to unexpected network, operating system, and file system differences that cause program executions that were successful in a development environment to fail once deployed. Preventing these bugs is difficult because it is impractical to test an application in every environment. Enter Simulating Environmental Anomalies (SEA), a technique that utilizes evidence of one application's failure in a given environment to generate tests that can be applied to other applications, to see whether they suffer from analogous faults. In SEA, models of unusual properties extracted from interactions between an application, A, and its environment guide simulations of another application, B, running in the anomalous environment. This reveals faults B may experience in this environment without the expense of deployment. By accumulating these anomalies, applications can be tested against an increasing set of problematic conditions. We implemented a tool called CrashSimulator, which uses SEA, and evaluated it against Linux applications selected from coreutils and the Debian popularity contest. Our tests found a total of 63 bugs in 31 applications with effects including hangs, crashes, data loss, and remote denial of service conditions.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123153394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding and Improving Regression Test Selection in Continuous Integration","authors":"A. Shi, Peiyuan Zhao, D. Marinov","doi":"10.1109/ISSRE.2019.00031","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00031","url":null,"abstract":"Developers rely on regression testing in their continuous integration (CI) environment to find changes that introduce regression faults. While regression testing is widely practiced, it can be costly. Regression test selection (RTS) reduces the cost of regression testing by not running the tests that are unaffected by the changes. Industry has adopted module-level RTS for their CI environment, while researchers have proposed class-level RTS. In this paper, we compare module-and class-level RTS techniques in a cloud-based CI environment, Travis. We also develop and evaluate a hybrid RTS technique that combines aspects of the module-and class-level RTS techniques. We evaluate all the techniques on real Travis builds. We find that the RTS techniques do save testing time compared to running all tests (RetestAll), but the percentage of time for a full build using RTS (76.0%) is not as low as found in previous work, due to the extra overhead in a cloud-based CI environment. Moreover, we inspect test failures from RetestAll builds, and although we find that RTS techniques can miss to select failed tests, these test failures are almost all flaky test failures. As such, RTS techniques provide additional value in helping developers avoid wasting time debugging failures not related to the recent code changes. Overall, our results show that RTS can be beneficial for the developers in the CI environment, and RTS not only saves time but also avoids misleading developers by flaky test failures.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125454815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trustworthiness Assessment of Web Applications: Approach and Experimental Study using Input Validation Coding Practices","authors":"C. Lemes, Vincent Naessens, M. Vieira","doi":"10.1109/ISSRE.2019.00050","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00050","url":null,"abstract":"The popularity of web applications and their world-wide use to support business critical operations raised the interest of hackers on exploiting security vulnerabilities to perform malicious operations. Fostering trust calls for assessment techniques that provide indicators about the quality of a web application from a security perspective. This paper studies the problem of using coding practices to characterize the trustworthiness of web applications from a security perspective. The hypothesis is that applying feasible security practices results in applications having a reduced number of unknown vulnerabilities, and can therefore be considered more trustworthy. The proposed approach is instantiated for the concrete case of input validation practices, and includes a Quality Model to compute trustworthiness scores that can be used to compare different applications or different code elements in the same application. Experimental results show that the higher scores are obtained for more secure code, suggesting that it can be used in practice to characterize trustworthiness, also providing guidance to compare and/or improve the security of web applications.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127185006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianyi Zhang, Cuiyun Gao, Lei Ma, Michael R. Lyu, Miryung Kim
{"title":"An Empirical Study of Common Challenges in Developing Deep Learning Applications","authors":"Tianyi Zhang, Cuiyun Gao, Lei Ma, Michael R. Lyu, Miryung Kim","doi":"10.1109/ISSRE.2019.00020","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00020","url":null,"abstract":"Recent advances in deep learning promote the innovation of many intelligent systems and applications such as autonomous driving and image recognition. Despite enormous efforts and investments in this field, a fundamental question remains under-investigated—what challenges do developers commonly face when building deep learning applications? To seek an answer, this paper presents a large-scale empirical study of deep learning questions in a popular Q&A website, Stack Overflow. We manually inspect a sample of 715 questions and identify seven kinds of frequently asked questions. We further build a classification model to quantify the distribution of different kinds of deep learning questions in the entire set of 39,628 deep learning questions. We find that program crashes, model migration, and implementation questions are the top three most frequently asked questions. After carefully examining accepted answers of these questions, we summarize five main root causes that may deserve attention from the research community, including API misuse, incorrect hyperparameter selection, GPU computation, static graph computation, and limited debugging and profiling support. Our results highlight the need for new techniques such as cross-framework differential testing to improve software development productivity and software reliability in deep learning.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115021012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Gopinath, Mengshi Zhang, Kaiyuan Wang, Ismet Burak Kadron, C. Pasareanu, S. Khurshid
{"title":"Symbolic Execution for Importance Analysis and Adversarial Generation in Neural Networks","authors":"D. Gopinath, Mengshi Zhang, Kaiyuan Wang, Ismet Burak Kadron, C. Pasareanu, S. Khurshid","doi":"10.1109/ISSRE.2019.00039","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00039","url":null,"abstract":"Deep Neural Networks (DNN) are increasingly used in a variety of applications, many of them with serious safety and security concerns. This paper describes DeepCheck, a new approach for validating DNNs based on core ideas from program analysis, specifically from symbolic execution. DeepCheck implements novel techniques for lightweight symbolic analysis of DNNs and applies them to address two challenging problems in DNN analysis: 1) identification of important input features and 2) leveraging those features to create adversarial inputs. Experimental results with an MNIST image classification network and a sentiment network for textual data show that DeepCheck promises to be a valuable tool for DNN analysis.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130894813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucas Palazzi, Guanpeng Li, Bo Fang, K. Pattabiraman
{"title":"A Tale of Two Injectors: End-to-End Comparison of IR-Level and Assembly-Level Fault Injection","authors":"Lucas Palazzi, Guanpeng Li, Bo Fang, K. Pattabiraman","doi":"10.1109/ISSRE.2019.00024","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00024","url":null,"abstract":"Fault injection (FI) is a commonly used experimental technique to evaluate the resilience of software techniques for tolerating hardware faults. Software-implemented FI can be performed at different levels of abstraction in the system stack; FI performed at the compiler's intermediate representation (IR) level has the advantage that it is closer to the program being evaluated and is hence easier to derive insights from for the design of software fault-tolerance mechanisms. Unfortunately, it is not clear how accurate IR-level FI is vis-a-vis FI performed at the assembly code level, and prior work has presented contradictory findings. In this paper, we perform an analysis of said prior work, find an inconsistency in the FI methodology used in one study, and show that it results in a flawed comparison between IR-level and assembly-level FI. We further confirm this finding by performing a comprehensive evaluation of the accuracy of IR-level FI across a range of benchmark programs and compiler optimization levels. Our results show that IR-level FI is as accurate as assembly-level FI for silent data corruptions (SDCs) across different benchmarks and optimization levels.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"154 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134036268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ping Liu, Yu Chen, Xiaohui Nie, Jing Zhu, Shenglin Zhang, Kaixin Sui, Ming Zhang, Dan Pei
{"title":"FluxRank: A Widely-Deployable Framework to Automatically Localizing Root Cause Machines for Software Service Failure Mitigation","authors":"Ping Liu, Yu Chen, Xiaohui Nie, Jing Zhu, Shenglin Zhang, Kaixin Sui, Ming Zhang, Dan Pei","doi":"10.1109/ISSRE.2019.00014","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00014","url":null,"abstract":"The failures of software service directly affect user experiences and service revenue. Thus operators monitor both service-level KPIs (e.g., response time) and machine-level KPIs (e.g., CPU usage) on each machine underlying the service. When a service fails, the operators must localize the root cause machines, and mitigate the failure as quickly as possible. Existing approaches have limited application due to the difficulty to obtain the required additional measurement data. As a result, failure localization is largely manual and very time-consuming. This paper presents FluxRank, a widely-deployable framework that can automatically and accurately localize the root cause machines, so that some actions can be triggered to mitigate the service failure. Our evaluation using historical cases from five real services (with tens of thousands of machines) of a top search company shows that the root cause machines are ranked top 1 (top 3) for 55 (66) cases out of 70 cases. Comparing to existing approaches, FluxRank cuts the localization time by more than 80% on average. FluxRank has been deployed online at one Internet service and six banking services for three months, and correctly localized the root cause machines as the top 1 for 55 cases out of 59 cases.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"63 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128238837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ISSRE 2019 Program Committee","authors":"","doi":"10.1109/issre.2019.00008","DOIUrl":"https://doi.org/10.1109/issre.2019.00008","url":null,"abstract":"","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116542098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrating Safety Certification Into Model-Based Testing of Safety-Critical Systems","authors":"Aiman Gannous, A. Andrews","doi":"10.1109/ISSRE.2019.00033","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00033","url":null,"abstract":"Testing plays an important role in assuring the safety of safety-critical systems (SCS). Testing SCSs should include tasks to test how the system operates in the presence of failures. With the increase of autonomous, sensing-based functionality in safety-critical systems, efficient and cost-effective testing that maximizes safety evidences has become increasingly challenging. A previously proposed framework for testing safety-critical systems called Model-Combinatorial based testing (MCbt) has the potential for addressing these challenges. MCbt is a framework that proposes an integration of model-based testing, fault analysis, and combinatorial testing to produce the maximum number of evidences for an efficient safety certification process but was never actually used to derive a specific testing approach. In this paper, we present a concrete application of MCbt with an application to a case study. The validation showed that MCbt is more efficient and produces more safety evidences compared to state-of-the-art testing approaches.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132257543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}