W. Wong, Huaijin Wang, Pingchuan Ma, Shuai Wang, Mingyue Jiang, T. Chen, Qiyi Tang, Sen Nie, Shi Wu
{"title":"利用对抗程序欺骗基于深度神经网络的二进制代码匹配","authors":"W. Wong, Huaijin Wang, Pingchuan Ma, Shuai Wang, Mingyue Jiang, T. Chen, Qiyi Tang, Sen Nie, Shi Wu","doi":"10.1109/ICSME55016.2022.00019","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) have achieved a major success in solving challenging tasks such as social networks analysis and image classification. Despite the prosperous development of DNNs, recent research has demonstrated the feasibility of exploiting DNNs using adversarial examples, in which a small distortion is added into the input data to largely mislead prediction of DNNs.Determining the similarity of two binary codes is the foundation for many reverse engineering, re-engineering, and security applications. Currently, the majority of binary code matching tools are based on DNNs, the dependability of which has not been completely studied. In this research, we present an attack that perturbs software in executable format to deceive DNN-based binary code matching. Unlike prior attacks which mostly change non-functional code components to generate adversarial programs, our approach proposes the design of several semantics-preserving transformations directly toward the control flow graph of binary code, making it particularly effective to deceive DNNs. To speedup the process, we design a framework that leverages gradient- or hill climbing-based optimizations to generate adversarial examples in both white-box and black-box settings. We evaluated our attack against two popular DNN-based binary code matching tools, asm2vec and ncc, and achieve reasonably high success rates. Our attack toward an industrial-strength DNN-based binary code matching service, BinaryAI, shows that the proposed attack can fool remote APIs in challenging black-box settings with a success rate of over 16.2% (on average). Furthermore, we show that the generated adversarial programs can be used to augment robustness of two white-box models, asm2vec and ncc, reducing the attack success rates by 17.3% and 6.8% while preserving stable, if not better, standard accuracy.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deceiving Deep Neural Networks-Based Binary Code Matching with Adversarial Programs\",\"authors\":\"W. Wong, Huaijin Wang, Pingchuan Ma, Shuai Wang, Mingyue Jiang, T. Chen, Qiyi Tang, Sen Nie, Shi Wu\",\"doi\":\"10.1109/ICSME55016.2022.00019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks (DNNs) have achieved a major success in solving challenging tasks such as social networks analysis and image classification. Despite the prosperous development of DNNs, recent research has demonstrated the feasibility of exploiting DNNs using adversarial examples, in which a small distortion is added into the input data to largely mislead prediction of DNNs.Determining the similarity of two binary codes is the foundation for many reverse engineering, re-engineering, and security applications. Currently, the majority of binary code matching tools are based on DNNs, the dependability of which has not been completely studied. In this research, we present an attack that perturbs software in executable format to deceive DNN-based binary code matching. Unlike prior attacks which mostly change non-functional code components to generate adversarial programs, our approach proposes the design of several semantics-preserving transformations directly toward the control flow graph of binary code, making it particularly effective to deceive DNNs. To speedup the process, we design a framework that leverages gradient- or hill climbing-based optimizations to generate adversarial examples in both white-box and black-box settings. We evaluated our attack against two popular DNN-based binary code matching tools, asm2vec and ncc, and achieve reasonably high success rates. Our attack toward an industrial-strength DNN-based binary code matching service, BinaryAI, shows that the proposed attack can fool remote APIs in challenging black-box settings with a success rate of over 16.2% (on average). Furthermore, we show that the generated adversarial programs can be used to augment robustness of two white-box models, asm2vec and ncc, reducing the attack success rates by 17.3% and 6.8% while preserving stable, if not better, standard accuracy.\",\"PeriodicalId\":300084,\"journal\":{\"name\":\"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSME55016.2022.00019\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSME55016.2022.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Deceiving Deep Neural Networks-Based Binary Code Matching with Adversarial Programs
Deep neural networks (DNNs) have achieved a major success in solving challenging tasks such as social networks analysis and image classification. Despite the prosperous development of DNNs, recent research has demonstrated the feasibility of exploiting DNNs using adversarial examples, in which a small distortion is added into the input data to largely mislead prediction of DNNs.Determining the similarity of two binary codes is the foundation for many reverse engineering, re-engineering, and security applications. Currently, the majority of binary code matching tools are based on DNNs, the dependability of which has not been completely studied. In this research, we present an attack that perturbs software in executable format to deceive DNN-based binary code matching. Unlike prior attacks which mostly change non-functional code components to generate adversarial programs, our approach proposes the design of several semantics-preserving transformations directly toward the control flow graph of binary code, making it particularly effective to deceive DNNs. To speedup the process, we design a framework that leverages gradient- or hill climbing-based optimizations to generate adversarial examples in both white-box and black-box settings. We evaluated our attack against two popular DNN-based binary code matching tools, asm2vec and ncc, and achieve reasonably high success rates. Our attack toward an industrial-strength DNN-based binary code matching service, BinaryAI, shows that the proposed attack can fool remote APIs in challenging black-box settings with a success rate of over 16.2% (on average). Furthermore, we show that the generated adversarial programs can be used to augment robustness of two white-box models, asm2vec and ncc, reducing the attack success rates by 17.3% and 6.8% while preserving stable, if not better, standard accuracy.