概率拆卸

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) Pub Date : 2019-05-25 DOI:10.1109/ICSE.2019.00121

Kenneth A. Miller, Yonghwi Kwon, Yi Sun, Zhuo Zhang, X. Zhang, Zhiqiang Lin

{"title":"概率拆卸","authors":"Kenneth A. Miller, Yonghwi Kwon, Yi Sun, Zhuo Zhang, X. Zhang, Zhiqiang Lin","doi":"10.1109/ICSE.2019.00121","DOIUrl":null,"url":null,"abstract":"Disassembling stripped binaries is a prominent challenge for binary analysis, due to the interleaving of code segments and data, and the difficulties of resolving control transfer targets of indirect calls and jumps. As a result, most existing disassemblers have both false positives (FP) and false negatives (FN). We observe that uncertainty is inevitable in disassembly due to the information loss during compilation and code generation. Therefore, we propose to model such uncertainty using probabilities and propose a novel disassembly technique, which computes a probability for each address in the code space, indicating its likelihood of being a true positive instruction. The probability is computed from a set of features that are reachable to an address, including control flow and data flow features. Our experiments with more than two thousands binaries show that our technique does not have any FN and has only 3.7% FP. In comparison, a state-of-the-art superset disassembly technique has 85% FP. A rewriter built on our disassembly can generate binaries that are only half of the size of those by superset disassembly and run 3% faster. While many widely-used disassemblers such as IDA and BAP suffer from missing function entries, our experiment also shows that even without any function entry information, our disassembler can still achieve 0 FN and 6.8% FP.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"11 1","pages":"1187-1198"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":"{\"title\":\"Probabilistic Disassembly\",\"authors\":\"Kenneth A. Miller, Yonghwi Kwon, Yi Sun, Zhuo Zhang, X. Zhang, Zhiqiang Lin\",\"doi\":\"10.1109/ICSE.2019.00121\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Disassembling stripped binaries is a prominent challenge for binary analysis, due to the interleaving of code segments and data, and the difficulties of resolving control transfer targets of indirect calls and jumps. As a result, most existing disassemblers have both false positives (FP) and false negatives (FN). We observe that uncertainty is inevitable in disassembly due to the information loss during compilation and code generation. Therefore, we propose to model such uncertainty using probabilities and propose a novel disassembly technique, which computes a probability for each address in the code space, indicating its likelihood of being a true positive instruction. The probability is computed from a set of features that are reachable to an address, including control flow and data flow features. Our experiments with more than two thousands binaries show that our technique does not have any FN and has only 3.7% FP. In comparison, a state-of-the-art superset disassembly technique has 85% FP. A rewriter built on our disassembly can generate binaries that are only half of the size of those by superset disassembly and run 3% faster. While many widely-used disassemblers such as IDA and BAP suffer from missing function entries, our experiment also shows that even without any function entry information, our disassembler can still achieve 0 FN and 6.8% FP.\",\"PeriodicalId\":6736,\"journal\":{\"name\":\"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)\",\"volume\":\"11 1\",\"pages\":\"1187-1198\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"43\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSE.2019.00121\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE.2019.00121","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 43

摘要

由于代码段和数据的交错，以及解决间接调用和跳转的控制转移目标的困难，对剥离二进制文件的反汇编是二进制分析的一个突出挑战。因此，大多数现有的反汇编程序同时具有假阳性(FP)和假阴性(FN)。我们观察到，由于编译和代码生成过程中的信息丢失，不确定性在反汇编中是不可避免的。因此，我们建议使用概率对这种不确定性进行建模，并提出一种新的反汇编技术，该技术计算代码空间中每个地址的概率，表明其成为真正指令的可能性。概率是从一组可到达地址的特征(包括控制流和数据流特征)中计算出来的。我们对两千多个二进制的实验表明，我们的技术没有任何FN，只有3.7%的FP。相比之下，最先进的超集拆卸技术具有85%的FP。建立在我们的反汇编上的重写器可以生成的二进制文件只有超集反汇编的一半大小，运行速度快3%。虽然许多广泛使用的反汇编器(如IDA和BAP)存在缺少函数条目的问题，但我们的实验也表明，即使没有任何函数条目信息，我们的反汇编器仍然可以实现0 FN和6.8% FP。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Probabilistic Disassembly

Disassembling stripped binaries is a prominent challenge for binary analysis, due to the interleaving of code segments and data, and the difficulties of resolving control transfer targets of indirect calls and jumps. As a result, most existing disassemblers have both false positives (FP) and false negatives (FN). We observe that uncertainty is inevitable in disassembly due to the information loss during compilation and code generation. Therefore, we propose to model such uncertainty using probabilities and propose a novel disassembly technique, which computes a probability for each address in the code space, indicating its likelihood of being a true positive instruction. The probability is computed from a set of features that are reachable to an address, including control flow and data flow features. Our experiments with more than two thousands binaries show that our technique does not have any FN and has only 3.7% FP. In comparison, a state-of-the-art superset disassembly technique has 85% FP. A rewriter built on our disassembly can generate binaries that are only half of the size of those by superset disassembly and run 3% faster. While many widely-used disassemblers such as IDA and BAP suffer from missing function entries, our experiment also shows that even without any function entry information, our disassembler can still achieve 0 FN and 6.8% FP.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量