从二进制数据样本中自动发现和合成校验和算法

Proceedings of the 15th Workshop on Programming Languages and Analysis for Security Pub Date : 2020-11-09 DOI:10.1145/3411506.3417599

Lauren Labell, Jared Chandler, Kathleen Fisher

{"title":"从二进制数据样本中自动发现和合成校验和算法","authors":"Lauren Labell, Jared Chandler, Kathleen Fisher","doi":"10.1145/3411506.3417599","DOIUrl":null,"url":null,"abstract":"Reverse engineering unknown binary message formats is an important part of security research. Error detecting codes such as checksums and Cyclic Redundancy Check codes (CRCs) are commonly added to messages as a guard against corrupt or untrusted input. Before an analyst can manufacture input for software which uses checksums they must discover the algorithm to calculate a valid checksum. To address this need, we have developed a program synthesis based approach for detecting and reverse-engineering checksum algorithms automatically. Our approach takes a small set of binary messages as input and automatically returns a Python implementation of the checksum algorithm if one can be found. Our approach first performs a search over the message space to identify the location of the checksum and then uses program synthesis to identify the operations performed on the message to compute the checksum. We return to the user runnable code to both calculate a checksum from a message and to validate a message according to the checksum algorithm. We generate unit tests, allowing the user to validate the synthesized checksum algorithm is correct with regard to the input messages. We created the Tufts Checksum Corpus comprised of 12 checksum inference questions collected from posts on reverse engineering question and answer sites and 2 instances of common internet protocol checksums. Our approach successfully synthesized the underlying checksum algorithms for 12 out of 14 cases in our test suite.","PeriodicalId":110751,"journal":{"name":"Proceedings of the 15th Workshop on Programming Languages and Analysis for Security","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatic Discovery and Synthesis of Checksum Algorithms from Binary Data Samples\",\"authors\":\"Lauren Labell, Jared Chandler, Kathleen Fisher\",\"doi\":\"10.1145/3411506.3417599\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reverse engineering unknown binary message formats is an important part of security research. Error detecting codes such as checksums and Cyclic Redundancy Check codes (CRCs) are commonly added to messages as a guard against corrupt or untrusted input. Before an analyst can manufacture input for software which uses checksums they must discover the algorithm to calculate a valid checksum. To address this need, we have developed a program synthesis based approach for detecting and reverse-engineering checksum algorithms automatically. Our approach takes a small set of binary messages as input and automatically returns a Python implementation of the checksum algorithm if one can be found. Our approach first performs a search over the message space to identify the location of the checksum and then uses program synthesis to identify the operations performed on the message to compute the checksum. We return to the user runnable code to both calculate a checksum from a message and to validate a message according to the checksum algorithm. We generate unit tests, allowing the user to validate the synthesized checksum algorithm is correct with regard to the input messages. We created the Tufts Checksum Corpus comprised of 12 checksum inference questions collected from posts on reverse engineering question and answer sites and 2 instances of common internet protocol checksums. Our approach successfully synthesized the underlying checksum algorithms for 12 out of 14 cases in our test suite.\",\"PeriodicalId\":110751,\"journal\":{\"name\":\"Proceedings of the 15th Workshop on Programming Languages and Analysis for Security\",\"volume\":\"66 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 15th Workshop on Programming Languages and Analysis for Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3411506.3417599\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th Workshop on Programming Languages and Analysis for Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3411506.3417599","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

未知二进制消息格式的逆向工程是安全研究的重要组成部分。错误检测码(如校验和和循环冗余检查码)通常被添加到消息中，以防止损坏或不可信的输入。在分析人员为使用校验和的软件制造输入之前，他们必须发现计算有效校验和的算法。为了满足这一需求，我们开发了一种基于程序综合的方法来自动检测和逆向工程校验和算法。我们的方法将一小组二进制消息作为输入，并自动返回校验和算法的Python实现(如果可以找到的话)。我们的方法首先在消息空间中执行搜索以确定校验和的位置，然后使用程序合成来确定对消息执行的操作以计算校验和。我们返回到用户可运行代码来计算消息的校验和并根据校验和算法验证消息。我们生成单元测试，允许用户根据输入消息验证合成校验和算法是否正确。我们创建了Tufts校验和语料库，包括从逆向工程问答网站上收集的12个校验和推理问题和2个常见的互联网协议校验和实例。我们的方法成功地为测试套件中的14个案例中的12个案例合成了底层校验和算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automatic Discovery and Synthesis of Checksum Algorithms from Binary Data Samples

Reverse engineering unknown binary message formats is an important part of security research. Error detecting codes such as checksums and Cyclic Redundancy Check codes (CRCs) are commonly added to messages as a guard against corrupt or untrusted input. Before an analyst can manufacture input for software which uses checksums they must discover the algorithm to calculate a valid checksum. To address this need, we have developed a program synthesis based approach for detecting and reverse-engineering checksum algorithms automatically. Our approach takes a small set of binary messages as input and automatically returns a Python implementation of the checksum algorithm if one can be found. Our approach first performs a search over the message space to identify the location of the checksum and then uses program synthesis to identify the operations performed on the message to compute the checksum. We return to the user runnable code to both calculate a checksum from a message and to validate a message according to the checksum algorithm. We generate unit tests, allowing the user to validate the synthesized checksum algorithm is correct with regard to the input messages. We created the Tufts Checksum Corpus comprised of 12 checksum inference questions collected from posts on reverse engineering question and answer sites and 2 instances of common internet protocol checksums. Our approach successfully synthesized the underlying checksum algorithms for 12 out of 14 cases in our test suite.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 15th Workshop on Programming Languages and Analysis for Security

自引率

0.00%

发文量