Muhan Zeng, Yiqian Wu, Zhen Ye, Yingfei Xiong, Xin Zhang, Lu Zhang
{"title":"基于程序语义高效概率建模的故障定位","authors":"Muhan Zeng, Yiqian Wu, Zhen Ye, Yingfei Xiong, Xin Zhang, Lu Zhang","doi":"10.1145/3510003.3510073","DOIUrl":null,"url":null,"abstract":"Testing-based fault localization has been a significant topic in software engineering in the past decades. It localizes a faulty program element based on a set of passing and failing test executions. Since whether a fault could be triggered and detected by a test is related to program semantics, it is crucial to model program semantics in fault localization approaches. Existing approaches either consider the full semantics of the program (e.g., mutation-based fault localization and angelic debugging), leading to scalability issues, or ignore the semantics of the program (e.g., spectrum-based fault localization), leading to imprecise localization results. Our key idea is: by modeling only the correctness of program values but not their full semantics, a balance could be reached between effectiveness and scalability. To realize this idea, we introduce a probabilistic approach to model program semantics and utilize information from static analysis and dynamic execution traces in our modeling. Our approach, SmartFL (SeMantics bAsed pRobabilisTic Fault Localization), is evaluated on a real-world dataset, Defects4J. The top-1 statement-level accuracy of our approach is 21 %, which is the best among state-of-the-art methods. The average time cost is 210 seconds per fault while existing methods that capture full semantics are often 10x or more slower.","PeriodicalId":202896,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Fault Localization via Efficient Probabilistic Modeling of Program Semantics\",\"authors\":\"Muhan Zeng, Yiqian Wu, Zhen Ye, Yingfei Xiong, Xin Zhang, Lu Zhang\",\"doi\":\"10.1145/3510003.3510073\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Testing-based fault localization has been a significant topic in software engineering in the past decades. It localizes a faulty program element based on a set of passing and failing test executions. Since whether a fault could be triggered and detected by a test is related to program semantics, it is crucial to model program semantics in fault localization approaches. Existing approaches either consider the full semantics of the program (e.g., mutation-based fault localization and angelic debugging), leading to scalability issues, or ignore the semantics of the program (e.g., spectrum-based fault localization), leading to imprecise localization results. Our key idea is: by modeling only the correctness of program values but not their full semantics, a balance could be reached between effectiveness and scalability. To realize this idea, we introduce a probabilistic approach to model program semantics and utilize information from static analysis and dynamic execution traces in our modeling. Our approach, SmartFL (SeMantics bAsed pRobabilisTic Fault Localization), is evaluated on a real-world dataset, Defects4J. The top-1 statement-level accuracy of our approach is 21 %, which is the best among state-of-the-art methods. The average time cost is 210 seconds per fault while existing methods that capture full semantics are often 10x or more slower.\",\"PeriodicalId\":202896,\"journal\":{\"name\":\"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3510003.3510073\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3510003.3510073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Fault Localization via Efficient Probabilistic Modeling of Program Semantics
Testing-based fault localization has been a significant topic in software engineering in the past decades. It localizes a faulty program element based on a set of passing and failing test executions. Since whether a fault could be triggered and detected by a test is related to program semantics, it is crucial to model program semantics in fault localization approaches. Existing approaches either consider the full semantics of the program (e.g., mutation-based fault localization and angelic debugging), leading to scalability issues, or ignore the semantics of the program (e.g., spectrum-based fault localization), leading to imprecise localization results. Our key idea is: by modeling only the correctness of program values but not their full semantics, a balance could be reached between effectiveness and scalability. To realize this idea, we introduce a probabilistic approach to model program semantics and utilize information from static analysis and dynamic execution traces in our modeling. Our approach, SmartFL (SeMantics bAsed pRobabilisTic Fault Localization), is evaluated on a real-world dataset, Defects4J. The top-1 statement-level accuracy of our approach is 21 %, which is the best among state-of-the-art methods. The average time cost is 210 seconds per fault while existing methods that capture full semantics are often 10x or more slower.