{"title":"BinaryInferno: A Semantic-Driven Approach to Field Inference for Binary Message Formats","authors":"Jared Chandler, Adam Wick, Kathleen Fisher","doi":"10.14722/ndss.2023.23131","DOIUrl":null,"url":null,"abstract":"—We present B inary I nferno , a fully automatic tool for reverse engineering binary message formats. Given a set of mes- sages with the same format, the tool uses an ensemble of detectors to infer a collection of partial descriptions and then automatically integrates the partial descriptions into a semantically-meaningful description that can be used to parse future packets with the same format. As its ensemble, B inary I nferno uses a modular and extensible set of targeted detectors, including detectors for identifying atomic data types such as IEEE floats, timestamps, and integer length fields; for finding boundaries between adjacent fields using Shannon entropy; and for discovering variable-length sequences by searching for common serialization idioms. We evaluate B inary I nferno ’s performance on sets of packets drawn from 10 binary protocols. Our semantic-driven approach significantly decreases false positive rates and increases precision when compared to the previous state of the art. For top-level protocols we identify field boundaries with an average precision of 0.69, an average recall of 0.73, and an average false positive rate of 0.04, significantly outperforming five other state-of-the-art protocol reverse engineering tools on the same data sets: A wre (0.18, 0.03, 0.04), F ield H unter (0.68, 0.37, 0.01), N emesys (0.31, 0.44, 0.11), N etplier (0.29, 0.75, 0.22), and N etzob (0.57, 0.42, 0.03). We believe our improvements in precision and false positive rates represent what our target user most wants: semantically meaningful descriptions with fewer false positives.","PeriodicalId":199733,"journal":{"name":"Proceedings 2023 Network and Distributed System Security Symposium","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 2023 Network and Distributed System Security Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14722/ndss.2023.23131","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
—We present B inary I nferno , a fully automatic tool for reverse engineering binary message formats. Given a set of mes- sages with the same format, the tool uses an ensemble of detectors to infer a collection of partial descriptions and then automatically integrates the partial descriptions into a semantically-meaningful description that can be used to parse future packets with the same format. As its ensemble, B inary I nferno uses a modular and extensible set of targeted detectors, including detectors for identifying atomic data types such as IEEE floats, timestamps, and integer length fields; for finding boundaries between adjacent fields using Shannon entropy; and for discovering variable-length sequences by searching for common serialization idioms. We evaluate B inary I nferno ’s performance on sets of packets drawn from 10 binary protocols. Our semantic-driven approach significantly decreases false positive rates and increases precision when compared to the previous state of the art. For top-level protocols we identify field boundaries with an average precision of 0.69, an average recall of 0.73, and an average false positive rate of 0.04, significantly outperforming five other state-of-the-art protocol reverse engineering tools on the same data sets: A wre (0.18, 0.03, 0.04), F ield H unter (0.68, 0.37, 0.01), N emesys (0.31, 0.44, 0.11), N etplier (0.29, 0.75, 0.22), and N etzob (0.57, 0.42, 0.03). We believe our improvements in precision and false positive rates represent what our target user most wants: semantically meaningful descriptions with fewer false positives.
-我们提供二进制二进制消息格式逆向工程的全自动工具。给定一组具有相同格式的消息,该工具使用检测器集合来推断部分描述的集合,然后自动将部分描述集成为语义上有意义的描述,该描述可用于解析具有相同格式的未来数据包。作为它的集成,二进制I地狱使用一组模块化和可扩展的目标检测器,包括用于识别原子数据类型(如IEEE浮点数、时间戳和整数长度字段)的检测器;利用香农熵寻找相邻场之间的边界;以及通过搜索常见的序列化习惯用法来发现变长序列。我们对从10个二进制协议中抽取的数据包集进行了性能评估。与之前的技术相比,我们的语义驱动方法显著降低了误报率,提高了精度。对于顶级协议,我们识别字段边界的平均精度为0.69,平均召回率为0.73,平均假阳性率为0.04,在相同的数据集上显著优于其他五种最先进的协议逆向工程工具:A wre (0.18, 0.03, 0.04), F field H unter (0.68, 0.37, 0.01), N emesys (0.31, 0.44, 0.11), N etplier(0.29, 0.75, 0.22)和N etzob(0.57, 0.42, 0.03)。我们相信我们在准确性和误报率方面的改进代表了我们的目标用户最想要的:语义上有意义的描述和更少的误报。