{"title":"First byte: Force-based clustering of filtered block N-grams to detect code reuse in malicious software","authors":"Jason Upchurch, Xiaobo Zhou","doi":"10.1109/MALWARE.2013.6703687","DOIUrl":null,"url":null,"abstract":"Detecting code reuse in malicious software is complicated by the lack of source code. The same circumstance that makes code reuse detection in malicious software desirable, that is, the limited availability of original source code, also contributes to the difficulty of detecting code reuse. In this paper, we propose a method for detecting code reuse in software, specifically malicious software, that moves beyond the limitations of targeting variant detection (categorization of families). This method expands n-gram analysis to target basic blocks extracted from compiled code vice entire text sections. It also targets individual relationships between basic blocks found in localized code reuse, while preserving the ability to detect variants and families of variants found with generalized code reuse. We demonstrate the limitations of similarity calculated without first disassembling the instructions and show that our First Byte normalization gives dramatic improvements in detection of code reuse. To visualize results, our method proposes force-based clustering as a solution to rapidly detect relationships between compiled binaries and detect relationships without complex analysis. Our methods retain the previously demonstrated ability of n-gram analysis to detect variants, while adding the ability to detect code reuse in non-variant malware. We show that our proposed filtering method reduces the number of similarity calculations and highlights only meaningful relationships in our malware set.","PeriodicalId":325281,"journal":{"name":"2013 8th International Conference on Malicious and Unwanted Software: \"The Americas\" (MALWARE)","volume":"58 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 8th International Conference on Malicious and Unwanted Software: \"The Americas\" (MALWARE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MALWARE.2013.6703687","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Detecting code reuse in malicious software is complicated by the lack of source code. The same circumstance that makes code reuse detection in malicious software desirable, that is, the limited availability of original source code, also contributes to the difficulty of detecting code reuse. In this paper, we propose a method for detecting code reuse in software, specifically malicious software, that moves beyond the limitations of targeting variant detection (categorization of families). This method expands n-gram analysis to target basic blocks extracted from compiled code vice entire text sections. It also targets individual relationships between basic blocks found in localized code reuse, while preserving the ability to detect variants and families of variants found with generalized code reuse. We demonstrate the limitations of similarity calculated without first disassembling the instructions and show that our First Byte normalization gives dramatic improvements in detection of code reuse. To visualize results, our method proposes force-based clustering as a solution to rapidly detect relationships between compiled binaries and detect relationships without complex analysis. Our methods retain the previously demonstrated ability of n-gram analysis to detect variants, while adding the ability to detect code reuse in non-variant malware. We show that our proposed filtering method reduces the number of similarity calculations and highlights only meaningful relationships in our malware set.