BinCola: Diversity-Sensitive Contrastive Learning for Binary Code Similarity Detection

IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Shuai Jiang;Cai Fu;Shuai He;Jianqiang Lv;Lansheng Han;Hong Hu
{"title":"BinCola: Diversity-Sensitive Contrastive Learning for Binary Code Similarity Detection","authors":"Shuai Jiang;Cai Fu;Shuai He;Jianqiang Lv;Lansheng Han;Hong Hu","doi":"10.1109/TSE.2024.3411072","DOIUrl":null,"url":null,"abstract":"Binary Code Similarity Detection (BCSD) is a fundamental binary analysis technique in the area of software security. Recently, advanced deep learning algorithms are integrated into BCSD platforms to achieve superior performance on well-known benchmarks. However, real-world large programs embed more complex diversities due to different compilers, various optimization levels, multiple architectures and even obfuscations. Existing BCSD solutions suffer from low accuracy issues in such complicated real-world application scenarios. In this paper, we propose BinCola, a novel Transformer-based dual diversity-sensitive contrastive learning framework that comprehensively considers the diversity of compiler options and candidate functions in the real-world application scenarios and employs the attention mechanism to fuse multi-granularity function features for enhancing generality and scalability. BinCola simultaneously compares multiple candidate functions across various compilation option scenarios to learn the differences caused by distinct compiler options and different candidate functions. We evaluate BinCola's performance in a variety of ways, including binary similarity detection and real-world vulnerability search in multiple application scenarios. The results demonstrate that BinCola achieves superior performance compared to state-of-the-art (SOTA) methods, with improvements of 2.80%, 33.62%, 22.41%, and 34.25% in cross-architecture, cross-optimization level, cross-compiler, and cross-obfuscation scenarios, respectively.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 10","pages":"2485-2497"},"PeriodicalIF":6.5000,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10589540/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Binary Code Similarity Detection (BCSD) is a fundamental binary analysis technique in the area of software security. Recently, advanced deep learning algorithms are integrated into BCSD platforms to achieve superior performance on well-known benchmarks. However, real-world large programs embed more complex diversities due to different compilers, various optimization levels, multiple architectures and even obfuscations. Existing BCSD solutions suffer from low accuracy issues in such complicated real-world application scenarios. In this paper, we propose BinCola, a novel Transformer-based dual diversity-sensitive contrastive learning framework that comprehensively considers the diversity of compiler options and candidate functions in the real-world application scenarios and employs the attention mechanism to fuse multi-granularity function features for enhancing generality and scalability. BinCola simultaneously compares multiple candidate functions across various compilation option scenarios to learn the differences caused by distinct compiler options and different candidate functions. We evaluate BinCola's performance in a variety of ways, including binary similarity detection and real-world vulnerability search in multiple application scenarios. The results demonstrate that BinCola achieves superior performance compared to state-of-the-art (SOTA) methods, with improvements of 2.80%, 33.62%, 22.41%, and 34.25% in cross-architecture, cross-optimization level, cross-compiler, and cross-obfuscation scenarios, respectively.
BinCola:二进制代码相似性检测的多样性敏感对比学习
二进制代码相似性检测(BCSD)是软件安全领域的一项基本二进制分析技术。最近,先进的深度学习算法被集成到 BCSD 平台中,从而在著名的基准测试中取得了优异的性能。然而,现实世界中的大型程序由于编译器不同、优化水平各异、架构多样,甚至存在混淆现象,因而嵌入了更为复杂的多样性。现有的 BCSD 解决方案在如此复杂的实际应用场景中存在准确性低的问题。在本文中,我们提出了基于变换器的新型双多样性敏感对比学习框架 BinCola,该框架全面考虑了真实世界应用场景中编译器选项和候选函数的多样性,并采用注意力机制融合多粒度函数特征,以增强通用性和可扩展性。BinCola 同时比较不同编译选项情况下的多个候选函数,以学习不同编译选项和不同候选函数所造成的差异。我们通过多种方式对 BinCola 的性能进行了评估,包括二进制相似性检测和多种应用场景下的实际漏洞搜索。结果表明,与最先进的(SOTA)方法相比,BinCola 实现了更优越的性能,在跨体系结构、跨优化级别、跨编译器和跨混淆场景中分别提高了 2.80%、33.62%、22.41% 和 34.25%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering 工程技术-工程:电子与电气
CiteScore
9.70
自引率
10.80%
发文量
724
审稿时长
6 months
期刊介绍: IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信