VulHawk: Cross-architecture Vulnerability Detection with Entropy-based Binary Code Search

Zhenhao Luo, Pengfei Wang, Baosheng Wang, Yong Tang, Wei Xie, Xu Zhou, Danjun Liu, Kai Lu
{"title":"VulHawk: Cross-architecture Vulnerability Detection with Entropy-based Binary Code Search","authors":"Zhenhao Luo, Pengfei Wang, Baosheng Wang, Yong Tang, Wei Xie, Xu Zhou, Danjun Liu, Kai Lu","doi":"10.14722/ndss.2023.24415","DOIUrl":null,"url":null,"abstract":"Code reuse is widespread in software development. It brings a heavy spread of vulnerabilities, threatening software security. Unfortunately, with the development and deployment of the Internet of Things (IoT), the harms of code reuse are magnified. Binary code search is a viable way to find these hidden vulnerabilities. Facing IoT firmware images compiled by different compilers with different optimization levels from different architectures, the existing methods are hard to fit these complex scenarios. In this paper, we propose a novel intermediate representation function model, which is an architecture-agnostic model for cross-architecture binary code search. It lifts binary code into microcode and preserves the main semantics of binary functions via complementing implicit operands and pruning redundant instructions. Then, we use natural language processing techniques and graph convolutional networks to generate function embeddings. We call the combination of a compiler, architecture, and optimization level as a file environment, and take a divideand-conquer strategy to divide a similarity calculation problem of C N cross-file-environment scenarios into N − 1 embedding transferring sub-problems. We propose an entropy-based adapter to transfer function embeddings from different file environments into the same file environment to alleviate the differences caused by various file environments. To precisely identify vulnerable functions, we propose a progressive search strategy to supplement function embeddings with fine-grained features to reduce false positives caused by patched functions. We implement a prototype named VulHawk and conduct experiments under seven different tasks to evaluate its performance and robustness. The experiments show VulHawk outperforms Asm2Vec, Asteria, BinDiff, GMN, PalmTree, SAFE, and Trex.","PeriodicalId":199733,"journal":{"name":"Proceedings 2023 Network and Distributed System Security Symposium","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 2023 Network and Distributed System Security Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14722/ndss.2023.24415","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Code reuse is widespread in software development. It brings a heavy spread of vulnerabilities, threatening software security. Unfortunately, with the development and deployment of the Internet of Things (IoT), the harms of code reuse are magnified. Binary code search is a viable way to find these hidden vulnerabilities. Facing IoT firmware images compiled by different compilers with different optimization levels from different architectures, the existing methods are hard to fit these complex scenarios. In this paper, we propose a novel intermediate representation function model, which is an architecture-agnostic model for cross-architecture binary code search. It lifts binary code into microcode and preserves the main semantics of binary functions via complementing implicit operands and pruning redundant instructions. Then, we use natural language processing techniques and graph convolutional networks to generate function embeddings. We call the combination of a compiler, architecture, and optimization level as a file environment, and take a divideand-conquer strategy to divide a similarity calculation problem of C N cross-file-environment scenarios into N − 1 embedding transferring sub-problems. We propose an entropy-based adapter to transfer function embeddings from different file environments into the same file environment to alleviate the differences caused by various file environments. To precisely identify vulnerable functions, we propose a progressive search strategy to supplement function embeddings with fine-grained features to reduce false positives caused by patched functions. We implement a prototype named VulHawk and conduct experiments under seven different tasks to evaluate its performance and robustness. The experiments show VulHawk outperforms Asm2Vec, Asteria, BinDiff, GMN, PalmTree, SAFE, and Trex.
VulHawk:基于熵的二进制代码搜索的跨架构漏洞检测
代码重用在软件开发中非常普遍。它带来了大量的漏洞,威胁着软件安全。不幸的是,随着物联网(IoT)的发展和部署,代码重用的危害被放大了。二进制代码搜索是找到这些隐藏漏洞的可行方法。面对不同架构、不同优化级别的编译器编译的物联网固件映像,现有方法难以适应这些复杂场景。本文提出了一种新的中间表示函数模型,该模型是一种与体系结构无关的跨体系结构二进制码搜索模型。它将二进制代码提升为微码,并通过补充隐式操作数和修剪冗余指令来保留二进制函数的主要语义。然后,我们使用自然语言处理技术和图卷积网络来生成函数嵌入。我们将编译器、体系结构和优化层的组合称为文件环境,并采用分而征服的策略,将C - N个跨文件环境场景的相似度计算问题划分为N−1个嵌入传输子问题。我们提出了一种基于熵的适配器,将不同文件环境中的函数嵌入转移到同一文件环境中,以减轻不同文件环境造成的差异。为了精确识别脆弱函数,我们提出了一种渐进式搜索策略,用细粒度特征补充函数嵌入,以减少修补函数造成的误报。我们实现了一个名为VulHawk的原型,并在七个不同的任务下进行了实验,以评估其性能和鲁棒性。实验表明,VulHawk的性能优于Asm2Vec、Asteria、BinDiff、GMN、PalmTree、SAFE和Trex。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信