Using targeted symbolic execution for reducing false-positives in dataflow analysis

Proceedings of the 4th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis Pub Date : 2015-06-14 DOI:10.1145/2771284.2771285

Steven Arzt, Siegfried Rasthofer, Robert Hahn, E. Bodden

{"title":"Using targeted symbolic execution for reducing false-positives in dataflow analysis","authors":"Steven Arzt, Siegfried Rasthofer, Robert Hahn, E. Bodden","doi":"10.1145/2771284.2771285","DOIUrl":null,"url":null,"abstract":"Static data flow analysis is an indispensable tool for finding potentially malicious data leaks in software programs. Programs, nowadays often consisting of millions of lines of code, have grown much too large to allow for a complete manual inspection. Nevertheless, security experts need to judge whether an application is trustworthy or not, developers need to find bugs, and quality experts need to assess the maturity of software products. Thus, analysts take advantage of automated data flow analysis tools to find candidates for suspicious leaks which are then further investigated. While much progress has been made in the area with a broad variety of static data flow analysis tools proposed in academia and being offered commercially, the number of false alarms raised by these tools is still a concern. Many of the false alarms are reported because the analysis tool detects data flows along paths which are not realizable at runtime, e.g., due to contradictory conditions on the path. Still, every single report is a potential issue and must be reviewed by an expert which is labor-intensive and costly. In this work, we therefore propose TASMAN, a post-analysis based on symbolic execution that removes such false data leaks along unrealizable paths from the result set. Thus, it greatly improves the usefulness of the result presented to the human analyst. In our experiments on DroidBench examples, TASMAN reduces the number of false positives by about 80% without pruning any true positives. Additionally, TASMAN also identified false positives in real-world examples which we confirmed by hand. With an average execution time of 5.4 seconds per alleged leak to be checked on large real-world applications, TASMAN is fast enough for practical use.","PeriodicalId":169086,"journal":{"name":"Proceedings of the 4th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2771284.2771285","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

Abstract

Static data flow analysis is an indispensable tool for finding potentially malicious data leaks in software programs. Programs, nowadays often consisting of millions of lines of code, have grown much too large to allow for a complete manual inspection. Nevertheless, security experts need to judge whether an application is trustworthy or not, developers need to find bugs, and quality experts need to assess the maturity of software products. Thus, analysts take advantage of automated data flow analysis tools to find candidates for suspicious leaks which are then further investigated. While much progress has been made in the area with a broad variety of static data flow analysis tools proposed in academia and being offered commercially, the number of false alarms raised by these tools is still a concern. Many of the false alarms are reported because the analysis tool detects data flows along paths which are not realizable at runtime, e.g., due to contradictory conditions on the path. Still, every single report is a potential issue and must be reviewed by an expert which is labor-intensive and costly. In this work, we therefore propose TASMAN, a post-analysis based on symbolic execution that removes such false data leaks along unrealizable paths from the result set. Thus, it greatly improves the usefulness of the result presented to the human analyst. In our experiments on DroidBench examples, TASMAN reduces the number of false positives by about 80% without pruning any true positives. Additionally, TASMAN also identified false positives in real-world examples which we confirmed by hand. With an average execution time of 5.4 seconds per alleged leak to be checked on large real-world applications, TASMAN is fast enough for practical use.

查看原文本刊更多论文

使用目标符号执行来减少数据流分析中的误报

静态数据流分析是发现软件程序中潜在的恶意数据泄漏的不可或缺的工具。如今的程序通常由数百万行代码组成，它们已经变得太大，以至于无法进行完整的人工检查。然而，安全专家需要判断应用程序是否值得信赖，开发人员需要找到bug，质量专家需要评估软件产品的成熟度。因此，分析人员利用自动化数据流分析工具找到可疑泄漏的候选数据，然后进一步调查。虽然学术界提出了各种各样的静态数据流分析工具，并在商业上提供了这些工具，在这一领域取得了很大进展，但这些工具引起的假警报的数量仍然令人担忧。许多误报是由于分析工具检测到沿着运行时无法实现的路径的数据流，例如，由于路径上的矛盾条件。尽管如此，每一份报告都是一个潜在的问题，必须由专家进行审查，这是一项劳动密集型和昂贵的工作。因此，在这项工作中，我们提出了TASMAN，这是一种基于符号执行的后分析，可以消除结果集中沿着不可实现路径的虚假数据泄漏。因此，它大大提高了提供给人类分析人员的结果的有用性。在我们对DroidBench示例的实验中，TASMAN在不修剪任何真阳性的情况下将假阳性的数量减少了约80%。此外，TASMAN还在现实世界的例子中识别出误报，我们手工确认。在实际的大型应用程序中，每次泄漏检查的平均执行时间为5.4秒，TASMAN对于实际使用来说足够快。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 4th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis

自引率

0.00%

发文量