Gefei Zuo, Jiacheng Ma, Andrew Quinn, Pramod Bhatotia, Pedro Fonseca, Baris Kasikci
{"title":"Execution reconstruction: harnessing failure reoccurrences for failure reproduction","authors":"Gefei Zuo, Jiacheng Ma, Andrew Quinn, Pramod Bhatotia, Pedro Fonseca, Baris Kasikci","doi":"10.1145/3453483.3454101","DOIUrl":null,"url":null,"abstract":"Reproducing production failures is crucial for software reliability. Alas, existing bug reproduction approaches are not suitable for production systems because they are not simultaneously efficient, effective, and accurate. In this work, we survey prior techniques and show that existing approaches over-prioritize a subset of these properties, and sacrifice the remaining ones. As a result, existing tools do not enable the plethora of proposed failure reproduction use-cases (e.g., debugging, security forensics, fuzzing) for production failures. We propose Execution Reconstruction (ER), a technique that strikes a better balance between efficiency, effectiveness and accuracy for reproducing production failures. ER uses hardware-assisted control and data tracing to shepherd symbolic execution and reproduce failures. ER’s key novelty lies in identifying data values that are both inexpensive to monitor and useful for eliding the scalability limitations of symbolic execution. ER harnesses failure reoccurrences by iteratively performing tracing and symbolic execution, which reduces runtime overhead. Whereas prior production-grade techniques can only reproduce short executions, ER can reproduce any reoccuring failure. Thus, unlike existing tools, ER reproduces fully replayable executions that can power a variety of debugging and reliabilty use cases. ER incurs on average 0.3% (up to 1.1%) runtime monitoring overhead for a broad range of real-world systems, making it practical for real-world deployment.","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3453483.3454101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
Reproducing production failures is crucial for software reliability. Alas, existing bug reproduction approaches are not suitable for production systems because they are not simultaneously efficient, effective, and accurate. In this work, we survey prior techniques and show that existing approaches over-prioritize a subset of these properties, and sacrifice the remaining ones. As a result, existing tools do not enable the plethora of proposed failure reproduction use-cases (e.g., debugging, security forensics, fuzzing) for production failures. We propose Execution Reconstruction (ER), a technique that strikes a better balance between efficiency, effectiveness and accuracy for reproducing production failures. ER uses hardware-assisted control and data tracing to shepherd symbolic execution and reproduce failures. ER’s key novelty lies in identifying data values that are both inexpensive to monitor and useful for eliding the scalability limitations of symbolic execution. ER harnesses failure reoccurrences by iteratively performing tracing and symbolic execution, which reduces runtime overhead. Whereas prior production-grade techniques can only reproduce short executions, ER can reproduce any reoccuring failure. Thus, unlike existing tools, ER reproduces fully replayable executions that can power a variety of debugging and reliabilty use cases. ER incurs on average 0.3% (up to 1.1%) runtime monitoring overhead for a broad range of real-world systems, making it practical for real-world deployment.