Neutaint: Efficient Dynamic Taint Analysis with Neural Networks

2020 IEEE Symposium on Security and Privacy (SP) Pub Date : 2019-07-08 DOI:10.1109/SP40000.2020.00022

Dongdong She, Yizheng Chen, Baishakhi Ray, S. Jana

{"title":"Neutaint: Efficient Dynamic Taint Analysis with Neural Networks","authors":"Dongdong She, Yizheng Chen, Baishakhi Ray, S. Jana","doi":"10.1109/SP40000.2020.00022","DOIUrl":null,"url":null,"abstract":"Dynamic taint analysis (DTA) is widely used by various applications to track information flow during runtime execution. Existing DTA techniques use rule-based taint-propagation, which is neither accurate (i.e., high false positive rate) nor efficient (i.e., large runtime overhead). It is hard to specify taint rules for each operation while covering all corner cases correctly. Moreover, the overtaint and undertaint errors can accumulate during the propagation of taint information across multiple operations. Finally, rule-based propagation requires each operation to be inspected before applying the appropriate rules resulting in prohibitive performance overhead on large real-world applications.In this work, we propose Neutaint, a novel end-to-end approach to track information flow using neural program embeddings. The neural program embeddings model the target’s programs computations taking place between taint sources and sinks, which automatically learns the information flow by observing a diverse set of execution traces. To perform lightweight and precise information flow analysis, we utilize saliency maps to reason about most influential sources for different sinks. Neutaint constructs two saliency maps, a popular machine learning approach to influence analysis, to summarize both coarse-grained and fine-grained information flow in the neural program embeddings.We compare Neutaint with 3 state-of-the-art dynamic taint analysis tools. The evaluation results show that Neutaint can achieve 68% accuracy, on average, which is 10% improvement while reducing 40× runtime overhead over the second-best taint tool Libdft on 6 real world programs. Neutaint also achieves 61% more edge coverage when used for taint-guided fuzzing indicating the effectiveness of the identified influential bytes. We also evaluate Neutaint’s ability to detect real world software attacks. The results show that Neutaint can successfully detect different types of vulnerabilities including buffer/heap/integer overflows, division by zero, etc. Lastly, Neutaint can detect 98.7% of total flows, the highest among all taint analysis tools.","PeriodicalId":6849,"journal":{"name":"2020 IEEE Symposium on Security and Privacy (SP)","volume":"75 1","pages":"1527-1543"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Symposium on Security and Privacy (SP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SP40000.2020.00022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 38

Abstract

Dynamic taint analysis (DTA) is widely used by various applications to track information flow during runtime execution. Existing DTA techniques use rule-based taint-propagation, which is neither accurate (i.e., high false positive rate) nor efficient (i.e., large runtime overhead). It is hard to specify taint rules for each operation while covering all corner cases correctly. Moreover, the overtaint and undertaint errors can accumulate during the propagation of taint information across multiple operations. Finally, rule-based propagation requires each operation to be inspected before applying the appropriate rules resulting in prohibitive performance overhead on large real-world applications.In this work, we propose Neutaint, a novel end-to-end approach to track information flow using neural program embeddings. The neural program embeddings model the target’s programs computations taking place between taint sources and sinks, which automatically learns the information flow by observing a diverse set of execution traces. To perform lightweight and precise information flow analysis, we utilize saliency maps to reason about most influential sources for different sinks. Neutaint constructs two saliency maps, a popular machine learning approach to influence analysis, to summarize both coarse-grained and fine-grained information flow in the neural program embeddings.We compare Neutaint with 3 state-of-the-art dynamic taint analysis tools. The evaluation results show that Neutaint can achieve 68% accuracy, on average, which is 10% improvement while reducing 40× runtime overhead over the second-best taint tool Libdft on 6 real world programs. Neutaint also achieves 61% more edge coverage when used for taint-guided fuzzing indicating the effectiveness of the identified influential bytes. We also evaluate Neutaint’s ability to detect real world software attacks. The results show that Neutaint can successfully detect different types of vulnerabilities including buffer/heap/integer overflows, division by zero, etc. Lastly, Neutaint can detect 98.7% of total flows, the highest among all taint analysis tools.

查看原文本刊更多论文

用神经网络进行有效的动态污点分析

动态污染分析(DTA)广泛用于各种应用程序在运行时执行期间跟踪信息流。现有的DTA技术使用基于规则的污染传播，既不准确(即高假阳性率)也不高效(即大的运行时开销)。在正确涵盖所有极端情况的同时，很难为每个操作指定污染规则。此外，在跨多个操作传播污染信息期间，overtaint和undertaint错误可能会累积。最后，基于规则的传播需要在应用适当的规则之前检查每个操作，从而导致在大型实际应用程序中产生令人望而却步的性能开销。在这项工作中，我们提出了Neutaint，一种新颖的端到端方法，使用神经程序嵌入来跟踪信息流。神经程序嵌入对目标程序在污染源和污染源之间的计算进行建模，通过观察不同的执行轨迹来自动学习信息流。为了执行轻量级和精确的信息流分析，我们利用显著性图来推断不同汇的最具影响力的来源。Neutaint构建了两个显著性图(一种流行的影响分析机器学习方法)来总结神经程序嵌入中的粗粒度和细粒度信息流。我们将Neutaint与3种最先进的动态污染分析工具进行比较。评估结果表明，Neutaint平均可以达到68%的准确率，在6个实际程序中，与第二好的污染工具Libdft相比，提高了10%，同时减少了40倍的运行时开销。当用于污染引导模糊检测时，Neutaint还实现了61%的边缘覆盖率，这表明已识别的影响字节的有效性。我们还评估了Neutaint检测真实世界软件攻击的能力。结果表明，Neutaint能够成功检测缓冲区/堆/整数溢出、除零等不同类型的漏洞。最后，Neutaint可以检测到98.7%的总流量，是所有污染分析工具中最高的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE Symposium on Security and Privacy (SP)

自引率

0.00%

发文量