First byte: Force-based clustering of filtered block N-grams to detect code reuse in malicious software

2013 8th International Conference on Malicious and Unwanted Software: "The Americas" (MALWARE) Pub Date : 2013-10-01 DOI:10.1109/MALWARE.2013.6703687

Jason Upchurch, Xiaobo Zhou

{"title":"First byte: Force-based clustering of filtered block N-grams to detect code reuse in malicious software","authors":"Jason Upchurch, Xiaobo Zhou","doi":"10.1109/MALWARE.2013.6703687","DOIUrl":null,"url":null,"abstract":"Detecting code reuse in malicious software is complicated by the lack of source code. The same circumstance that makes code reuse detection in malicious software desirable, that is, the limited availability of original source code, also contributes to the difficulty of detecting code reuse. In this paper, we propose a method for detecting code reuse in software, specifically malicious software, that moves beyond the limitations of targeting variant detection (categorization of families). This method expands n-gram analysis to target basic blocks extracted from compiled code vice entire text sections. It also targets individual relationships between basic blocks found in localized code reuse, while preserving the ability to detect variants and families of variants found with generalized code reuse. We demonstrate the limitations of similarity calculated without first disassembling the instructions and show that our First Byte normalization gives dramatic improvements in detection of code reuse. To visualize results, our method proposes force-based clustering as a solution to rapidly detect relationships between compiled binaries and detect relationships without complex analysis. Our methods retain the previously demonstrated ability of n-gram analysis to detect variants, while adding the ability to detect code reuse in non-variant malware. We show that our proposed filtering method reduces the number of similarity calculations and highlights only meaningful relationships in our malware set.","PeriodicalId":325281,"journal":{"name":"2013 8th International Conference on Malicious and Unwanted Software: \"The Americas\" (MALWARE)","volume":"58 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 8th International Conference on Malicious and Unwanted Software: \"The Americas\" (MALWARE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MALWARE.2013.6703687","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Detecting code reuse in malicious software is complicated by the lack of source code. The same circumstance that makes code reuse detection in malicious software desirable, that is, the limited availability of original source code, also contributes to the difficulty of detecting code reuse. In this paper, we propose a method for detecting code reuse in software, specifically malicious software, that moves beyond the limitations of targeting variant detection (categorization of families). This method expands n-gram analysis to target basic blocks extracted from compiled code vice entire text sections. It also targets individual relationships between basic blocks found in localized code reuse, while preserving the ability to detect variants and families of variants found with generalized code reuse. We demonstrate the limitations of similarity calculated without first disassembling the instructions and show that our First Byte normalization gives dramatic improvements in detection of code reuse. To visualize results, our method proposes force-based clustering as a solution to rapidly detect relationships between compiled binaries and detect relationships without complex analysis. Our methods retain the previously demonstrated ability of n-gram analysis to detect variants, while adding the ability to detect code reuse in non-variant malware. We show that our proposed filtering method reduces the number of similarity calculations and highlights only meaningful relationships in our malware set.

查看原文本刊更多论文

第一个字节:过滤块N-grams的基于力的聚类，以检测恶意软件中的代码重用

由于缺乏源代码，检测恶意软件中的代码重用变得复杂。在恶意软件中需要进行代码重用检测的情况是，原始源代码的可用性有限，这也增加了检测代码重用的难度。在本文中，我们提出了一种检测软件中代码重用的方法，特别是恶意软件，它超越了针对变体检测(家族分类)的限制。该方法将n-gram分析扩展到从编译代码中提取的基本块，包括整个文本部分。它还针对本地化代码重用中发现的基本块之间的个体关系，同时保留检测通用代码重用中发现的变体和变体族的能力。我们展示了在不首先反汇编指令的情况下计算相似度的局限性，并展示了我们的第一个字节规范化在检测代码重用方面有了显着的改进。为了可视化结果，我们的方法提出了基于力的聚类作为快速检测编译二进制文件之间关系的解决方案，并且无需复杂的分析即可检测关系。我们的方法保留了以前证明的n-gram分析检测变体的能力，同时增加了检测非变体恶意软件中代码重用的能力。我们表明，我们提出的过滤方法减少了相似性计算的数量，并且只突出了恶意软件集中有意义的关系。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 8th International Conference on Malicious and Unwanted Software: "The Americas" (MALWARE)

自引率

0.00%

发文量