{"title":"HAformer: Semantic fusion of hex machine code and assembly code for cross-architecture binary vulnerability detection","authors":"","doi":"10.1016/j.cose.2024.104029","DOIUrl":null,"url":null,"abstract":"<div><p>Binary vulnerability detection is a significant area of research in computer security. The existing methods for detecting binary vulnerabilities primarily rely on binary code similarity analysis, detecting vulnerabilities by comparing the similarities embedded in binary codes. Recently, Transformer-based models have achieved significant progress in this field, leveraging their advantage in handling sequential data to better understand the semantics of assembly code. However, to prevent the out-of-vocabulary (OOV) problems, assembly code typically needs to be normalized, which would lose some important numerical and jump information. In this paper, we propose HAformer, a Transformer-based model, which semantically fuses hexadecimal machine codes and assembly codes to extract richer semantic information from binary codes. By incorporating the hexadecimal machine code and a newly designed assembly code normalization method, HAformer can alleviate the problem of numerical information loss caused by traditional assembly code normalization, thereby addressing the issue of OOV. Evaluation results demonstrate that our HAformer outperforms the baseline method in the Recall@1 metric by 16.9%, 25.5% and 19.2% in cross-optimization level, cross-compiler and cross-architecture environments, respectively. In real-world vulnerability detection experiments, HAformer exhibits the highest accuracy.</p></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":null,"pages":null},"PeriodicalIF":4.8000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404824003341","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Binary vulnerability detection is a significant area of research in computer security. The existing methods for detecting binary vulnerabilities primarily rely on binary code similarity analysis, detecting vulnerabilities by comparing the similarities embedded in binary codes. Recently, Transformer-based models have achieved significant progress in this field, leveraging their advantage in handling sequential data to better understand the semantics of assembly code. However, to prevent the out-of-vocabulary (OOV) problems, assembly code typically needs to be normalized, which would lose some important numerical and jump information. In this paper, we propose HAformer, a Transformer-based model, which semantically fuses hexadecimal machine codes and assembly codes to extract richer semantic information from binary codes. By incorporating the hexadecimal machine code and a newly designed assembly code normalization method, HAformer can alleviate the problem of numerical information loss caused by traditional assembly code normalization, thereby addressing the issue of OOV. Evaluation results demonstrate that our HAformer outperforms the baseline method in the Recall@1 metric by 16.9%, 25.5% and 19.2% in cross-optimization level, cross-compiler and cross-architecture environments, respectively. In real-world vulnerability detection experiments, HAformer exhibits the highest accuracy.
期刊介绍:
Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world.
Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.