{"title":"二元分析与结构和代码段检测使用监督机器学习","authors":"Bryan Beckman, Jedediah Haile","doi":"10.1109/SPW50608.2020.00041","DOIUrl":null,"url":null,"abstract":"When presented with an unknown binary, which may or may not be complete, having the ability to determine information about it is critical to future reverse engineering, particularly in discovering the binary's intended use and potential malicious nature. This paper details techniques to both identify the machine architecture of the binary, as well as to locate the important code segments within the file. This identification of unknown binaries makes use of a technique called byte histogram in addition to various machine learning (ML) techniques, which we call “What is it Binary” or WiiBin. Benefits of byte histograms reflect the simplicity of calculation and do not rely on file headers or metadata, allowing for acceptable results when only a small portion of the original file is available; e.g., when encrypted and/or compressed sections are present in a binary. Utilizing WiiBin, we were able to accurately (>80%) determine the architecture of test binaries with as little as a 20% contagious portion of the file present. We were also able to determine the location of code sections within a binary by utilizing the WiiBin framework. Ultimately, the more information that can be gleaned from a binary file, the easier it is to successfully reverse engineer.","PeriodicalId":413600,"journal":{"name":"2020 IEEE Security and Privacy Workshops (SPW)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Binary Analysis with Architecture and Code Section Detection using Supervised Machine Learning\",\"authors\":\"Bryan Beckman, Jedediah Haile\",\"doi\":\"10.1109/SPW50608.2020.00041\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When presented with an unknown binary, which may or may not be complete, having the ability to determine information about it is critical to future reverse engineering, particularly in discovering the binary's intended use and potential malicious nature. This paper details techniques to both identify the machine architecture of the binary, as well as to locate the important code segments within the file. This identification of unknown binaries makes use of a technique called byte histogram in addition to various machine learning (ML) techniques, which we call “What is it Binary” or WiiBin. Benefits of byte histograms reflect the simplicity of calculation and do not rely on file headers or metadata, allowing for acceptable results when only a small portion of the original file is available; e.g., when encrypted and/or compressed sections are present in a binary. Utilizing WiiBin, we were able to accurately (>80%) determine the architecture of test binaries with as little as a 20% contagious portion of the file present. We were also able to determine the location of code sections within a binary by utilizing the WiiBin framework. Ultimately, the more information that can be gleaned from a binary file, the easier it is to successfully reverse engineer.\",\"PeriodicalId\":413600,\"journal\":{\"name\":\"2020 IEEE Security and Privacy Workshops (SPW)\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE Security and Privacy Workshops (SPW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPW50608.2020.00041\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Security and Privacy Workshops (SPW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPW50608.2020.00041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Binary Analysis with Architecture and Code Section Detection using Supervised Machine Learning
When presented with an unknown binary, which may or may not be complete, having the ability to determine information about it is critical to future reverse engineering, particularly in discovering the binary's intended use and potential malicious nature. This paper details techniques to both identify the machine architecture of the binary, as well as to locate the important code segments within the file. This identification of unknown binaries makes use of a technique called byte histogram in addition to various machine learning (ML) techniques, which we call “What is it Binary” or WiiBin. Benefits of byte histograms reflect the simplicity of calculation and do not rely on file headers or metadata, allowing for acceptable results when only a small portion of the original file is available; e.g., when encrypted and/or compressed sections are present in a binary. Utilizing WiiBin, we were able to accurately (>80%) determine the architecture of test binaries with as little as a 20% contagious portion of the file present. We were also able to determine the location of code sections within a binary by utilizing the WiiBin framework. Ultimately, the more information that can be gleaned from a binary file, the easier it is to successfully reverse engineer.