Dongdong She, Yizheng Chen, Baishakhi Ray, S. Jana
{"title":"Neutaint: Efficient Dynamic Taint Analysis with Neural Networks","authors":"Dongdong She, Yizheng Chen, Baishakhi Ray, S. Jana","doi":"10.1109/SP40000.2020.00022","DOIUrl":null,"url":null,"abstract":"Dynamic taint analysis (DTA) is widely used by various applications to track information flow during runtime execution. Existing DTA techniques use rule-based taint-propagation, which is neither accurate (i.e., high false positive rate) nor efficient (i.e., large runtime overhead). It is hard to specify taint rules for each operation while covering all corner cases correctly. Moreover, the overtaint and undertaint errors can accumulate during the propagation of taint information across multiple operations. Finally, rule-based propagation requires each operation to be inspected before applying the appropriate rules resulting in prohibitive performance overhead on large real-world applications.In this work, we propose Neutaint, a novel end-to-end approach to track information flow using neural program embeddings. The neural program embeddings model the target’s programs computations taking place between taint sources and sinks, which automatically learns the information flow by observing a diverse set of execution traces. To perform lightweight and precise information flow analysis, we utilize saliency maps to reason about most influential sources for different sinks. Neutaint constructs two saliency maps, a popular machine learning approach to influence analysis, to summarize both coarse-grained and fine-grained information flow in the neural program embeddings.We compare Neutaint with 3 state-of-the-art dynamic taint analysis tools. The evaluation results show that Neutaint can achieve 68% accuracy, on average, which is 10% improvement while reducing 40× runtime overhead over the second-best taint tool Libdft on 6 real world programs. Neutaint also achieves 61% more edge coverage when used for taint-guided fuzzing indicating the effectiveness of the identified influential bytes. We also evaluate Neutaint’s ability to detect real world software attacks. The results show that Neutaint can successfully detect different types of vulnerabilities including buffer/heap/integer overflows, division by zero, etc. Lastly, Neutaint can detect 98.7% of total flows, the highest among all taint analysis tools.","PeriodicalId":6849,"journal":{"name":"2020 IEEE Symposium on Security and Privacy (SP)","volume":"75 1","pages":"1527-1543"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Symposium on Security and Privacy (SP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SP40000.2020.00022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 38
Abstract
Dynamic taint analysis (DTA) is widely used by various applications to track information flow during runtime execution. Existing DTA techniques use rule-based taint-propagation, which is neither accurate (i.e., high false positive rate) nor efficient (i.e., large runtime overhead). It is hard to specify taint rules for each operation while covering all corner cases correctly. Moreover, the overtaint and undertaint errors can accumulate during the propagation of taint information across multiple operations. Finally, rule-based propagation requires each operation to be inspected before applying the appropriate rules resulting in prohibitive performance overhead on large real-world applications.In this work, we propose Neutaint, a novel end-to-end approach to track information flow using neural program embeddings. The neural program embeddings model the target’s programs computations taking place between taint sources and sinks, which automatically learns the information flow by observing a diverse set of execution traces. To perform lightweight and precise information flow analysis, we utilize saliency maps to reason about most influential sources for different sinks. Neutaint constructs two saliency maps, a popular machine learning approach to influence analysis, to summarize both coarse-grained and fine-grained information flow in the neural program embeddings.We compare Neutaint with 3 state-of-the-art dynamic taint analysis tools. The evaluation results show that Neutaint can achieve 68% accuracy, on average, which is 10% improvement while reducing 40× runtime overhead over the second-best taint tool Libdft on 6 real world programs. Neutaint also achieves 61% more edge coverage when used for taint-guided fuzzing indicating the effectiveness of the identified influential bytes. We also evaluate Neutaint’s ability to detect real world software attacks. The results show that Neutaint can successfully detect different types of vulnerabilities including buffer/heap/integer overflows, division by zero, etc. Lastly, Neutaint can detect 98.7% of total flows, the highest among all taint analysis tools.