Abdullah A. Qasem, M. Debbabi, Bernard Lebel, Marthe Kassouf
{"title":"Binary Function Clone Search in the Presence of Code Obfuscation and Optimization over Multi-CPU Architectures","authors":"Abdullah A. Qasem, M. Debbabi, Bernard Lebel, Marthe Kassouf","doi":"10.1145/3579856.3582818","DOIUrl":null,"url":null,"abstract":"Binary function clone search is an essential capability that enables multiple applications and use cases, including reverse engineering, patch security inspection, threat analysis, vulnerable function detection, etc. As such, a surge of interest has been expressed in designing and implementing techniques to address function similarity on binary executables and firmware images. Although existing approaches have merit in fingerprinting function clones, they present limitations when the target binary code has been subjected to significant code transformation resulting from obfuscation, compiler optimization, and/or cross-compilation to multiple-CPU architectures. In this regard, we design and implement a system named BinFinder, which employs a neural network to learn binary function embeddings based on a set of extracted features that are resilient to both code obfuscation and compiler optimization techniques. Our experimental evaluation indicates that BinFinder outperforms state-of-the-art approaches for multi-CPU architectures by a large margin, with 46% higher Recall against Gemini, 55% higher Recall against SAFE, and 28% higher Recall against GMN. With respect to obfuscation and compiler optimization clone search approaches, BinFinder outperforms the asm2vec (single CPU architecture approach) with higher Recall and BinMatch (multi-CPU architecture approach) with higher Recall. Finally, our work is the first to provide noteworthy results with respect to binary clone search over the tigress obfuscator, which is a well-established open-source obfuscator.","PeriodicalId":156082,"journal":{"name":"Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3579856.3582818","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Binary function clone search is an essential capability that enables multiple applications and use cases, including reverse engineering, patch security inspection, threat analysis, vulnerable function detection, etc. As such, a surge of interest has been expressed in designing and implementing techniques to address function similarity on binary executables and firmware images. Although existing approaches have merit in fingerprinting function clones, they present limitations when the target binary code has been subjected to significant code transformation resulting from obfuscation, compiler optimization, and/or cross-compilation to multiple-CPU architectures. In this regard, we design and implement a system named BinFinder, which employs a neural network to learn binary function embeddings based on a set of extracted features that are resilient to both code obfuscation and compiler optimization techniques. Our experimental evaluation indicates that BinFinder outperforms state-of-the-art approaches for multi-CPU architectures by a large margin, with 46% higher Recall against Gemini, 55% higher Recall against SAFE, and 28% higher Recall against GMN. With respect to obfuscation and compiler optimization clone search approaches, BinFinder outperforms the asm2vec (single CPU architecture approach) with higher Recall and BinMatch (multi-CPU architecture approach) with higher Recall. Finally, our work is the first to provide noteworthy results with respect to binary clone search over the tigress obfuscator, which is a well-established open-source obfuscator.