Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection

Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, Sencun Zhu
{"title":"Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection","authors":"Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, Sencun Zhu","doi":"10.1145/2635868.2635900","DOIUrl":null,"url":null,"abstract":"Existing code similarity comparison methods, whether source or binary code based, are mostly not resilient to obfuscations. In the case of software plagiarism, emerging obfuscation techniques have made automated detection increasingly difficult. In this paper, we propose a binary-oriented, obfuscation-resilient method based on a new concept, longest common subsequence of semantically equivalent basic blocks, which combines rigorous program semantics with longest common subsequence based fuzzy matching. We model the semantics of a basic block by a set of symbolic formulas representing the input-output relations of the block. This way, the semantics equivalence (and similarity) of two blocks can be checked by a theorem prover. We then model the semantics similarity of two paths using the longest common subsequence with basic blocks as elements. This novel combination has resulted in strong resiliency to code obfuscation. We have developed a prototype and our experimental results show that our method is effective and practical when applied to real-world software.","PeriodicalId":250543,"journal":{"name":"Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"194","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2635868.2635900","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 194

Abstract

Existing code similarity comparison methods, whether source or binary code based, are mostly not resilient to obfuscations. In the case of software plagiarism, emerging obfuscation techniques have made automated detection increasingly difficult. In this paper, we propose a binary-oriented, obfuscation-resilient method based on a new concept, longest common subsequence of semantically equivalent basic blocks, which combines rigorous program semantics with longest common subsequence based fuzzy matching. We model the semantics of a basic block by a set of symbolic formulas representing the input-output relations of the block. This way, the semantics equivalence (and similarity) of two blocks can be checked by a theorem prover. We then model the semantics similarity of two paths using the longest common subsequence with basic blocks as elements. This novel combination has resulted in strong resiliency to code obfuscation. We have developed a prototype and our experimental results show that our method is effective and practical when applied to real-world software.
基于语义的抗混淆二进制代码相似度比较在软件剽窃检测中的应用
现有的代码相似度比较方法,无论是基于源代码还是基于二进制代码,大多不能适应混淆。在软件剽窃的情况下,新兴的混淆技术使得自动检测变得越来越困难。本文提出了一种基于语义等效基本块的最长公共子序列概念的面向二进制的抗混淆方法,该方法将严格的程序语义与基于最长公共子序列的模糊匹配相结合。我们通过一组表示块的输入-输出关系的符号公式来建模基本块的语义。这样,两个区块的语义等价(和相似度)就可以由定理证明者来检验。然后,我们使用以基本块为元素的最长公共子序列对两条路径的语义相似性进行建模。这种新颖的组合导致了对代码混淆的强大弹性。我们已经开发了一个原型,实验结果表明我们的方法在实际软件中是有效和实用的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信