基于多级胎记模型的支持解释的软件重用检测

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) Pub Date : 2021-03-18 DOI:10.1109/ICSE43902.2021.00084

Xi Xu, Q. Zheng, Zheng Yan, Ming Fan, Ang Jia, Ting Liu

{"title":"基于多级胎记模型的支持解释的软件重用检测","authors":"Xi Xu, Q. Zheng, Zheng Yan, Ming Fan, Ang Jia, Ting Liu","doi":"10.1109/ICSE43902.2021.00084","DOIUrl":null,"url":null,"abstract":"Software reuse, especially partial reuse, poses legal and security threats to software development. Since its source codes are usually unavailable, software reuse is hard to be detected with interpretation. On the other hand, current approaches suffer from poor detection accuracy and efficiency, far from satisfying practical demands. To tackle these problems, in this paper, we propose ISRD, an interpretation-enabled software reuse detection approach based on a multi-level birthmark model that contains function level, basic block level, and instruction level. To overcome obfuscation caused by cross-compilation, we represent function semantics with Minimum Branch Path (MBP) and perform normalization to extract core semantics of instructions. For efficiently detecting reused functions, a process for \"intent search based on anchor recognition\" is designed to speed up reuse detection. It uses strict instruction match and identical library call invocation check to find anchor functions (in short anchors) and then traverses neighbors of the anchors to explore potentially matched function pairs. Extensive experiments based on two real-world binary datasets reveal that ISRD is interpretable, effective, and efficient, which achieves 97.2% precision and 94.8% recall. Moreover, it is resilient to cross-compilation, outperforming state-of-the-art approaches.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Interpretation-Enabled Software Reuse Detection Based on a Multi-level Birthmark Model\",\"authors\":\"Xi Xu, Q. Zheng, Zheng Yan, Ming Fan, Ang Jia, Ting Liu\",\"doi\":\"10.1109/ICSE43902.2021.00084\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software reuse, especially partial reuse, poses legal and security threats to software development. Since its source codes are usually unavailable, software reuse is hard to be detected with interpretation. On the other hand, current approaches suffer from poor detection accuracy and efficiency, far from satisfying practical demands. To tackle these problems, in this paper, we propose ISRD, an interpretation-enabled software reuse detection approach based on a multi-level birthmark model that contains function level, basic block level, and instruction level. To overcome obfuscation caused by cross-compilation, we represent function semantics with Minimum Branch Path (MBP) and perform normalization to extract core semantics of instructions. For efficiently detecting reused functions, a process for \\\"intent search based on anchor recognition\\\" is designed to speed up reuse detection. It uses strict instruction match and identical library call invocation check to find anchor functions (in short anchors) and then traverses neighbors of the anchors to explore potentially matched function pairs. Extensive experiments based on two real-world binary datasets reveal that ISRD is interpretable, effective, and efficient, which achieves 97.2% precision and 94.8% recall. Moreover, it is resilient to cross-compilation, outperforming state-of-the-art approaches.\",\"PeriodicalId\":305167,\"journal\":{\"name\":\"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)\",\"volume\":\"68 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-03-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSE43902.2021.00084\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE43902.2021.00084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

软件重用，尤其是部分重用，给软件开发带来了法律和安全方面的威胁。由于其源代码通常是不可用的，因此很难通过解释来检测软件重用。另一方面，现有方法的检测精度和效率较差，远远不能满足实际需求。为了解决这些问题，本文提出了ISRD，一种基于多级胎记模型的支持解释的软件重用检测方法，该模型包含功能层、基本块层和指令层。为了克服交叉编译造成的混淆，我们用最小分支路径(Minimum Branch Path, MBP)表示函数语义，并对指令的核心语义进行归一化提取。为了有效地检测重用函数，设计了一种“基于锚点识别的意图搜索”过程来加快重用检测的速度。它使用严格的指令匹配和相同的库调用调用检查来查找锚函数(简称锚)，然后遍历锚的邻居以探索可能匹配的函数对。基于两个真实二值数据集的大量实验表明，ISRD具有可解释性、有效性和高效性，准确率达到97.2%，召回率达到94.8%。此外，它对交叉编译具有弹性，优于最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Interpretation-Enabled Software Reuse Detection Based on a Multi-level Birthmark Model

Software reuse, especially partial reuse, poses legal and security threats to software development. Since its source codes are usually unavailable, software reuse is hard to be detected with interpretation. On the other hand, current approaches suffer from poor detection accuracy and efficiency, far from satisfying practical demands. To tackle these problems, in this paper, we propose ISRD, an interpretation-enabled software reuse detection approach based on a multi-level birthmark model that contains function level, basic block level, and instruction level. To overcome obfuscation caused by cross-compilation, we represent function semantics with Minimum Branch Path (MBP) and perform normalization to extract core semantics of instructions. For efficiently detecting reused functions, a process for "intent search based on anchor recognition" is designed to speed up reuse detection. It uses strict instruction match and identical library call invocation check to find anchor functions (in short anchors) and then traverses neighbors of the anchors to explore potentially matched function pairs. Extensive experiments based on two real-world binary datasets reveal that ISRD is interpretable, effective, and efficient, which achieves 97.2% precision and 94.8% recall. Moreover, it is resilient to cross-compilation, outperforming state-of-the-art approaches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量