{"title":"CrowdSource:使用大众训练的机器学习模型从低级符号自动推断高级恶意软件功能","authors":"Joshua Saxe, R. Turner, K. Blokhin","doi":"10.1109/MALWARE.2014.6999417","DOIUrl":null,"url":null,"abstract":"In this paper we introduce CrowdSource, a statistical natural language processing system designed to make rapid inferences about malware functionality based on printable character strings extracted from malware binaries. CrowdSource “learns” a mapping between low-level language and high-level software functionality by leveraging millions of web technical documents from StackExchange, a popular network of technical question and answer sites, using this mapping to infer malware capabilities. This paper describes our approach and provides an evaluation of its accuracy and performance, demonstrating that it can detect at least 14 high-level malware capabilities in unpacked malware binaries with an average per-capability f-score of 0.86 and at a rate of tens of thousands of binaries per day on commodity hardware.","PeriodicalId":151942,"journal":{"name":"2014 9th International Conference on Malicious and Unwanted Software: The Americas (MALWARE)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"CrowdSource: Automated inference of high level malware functionality from low-level symbols using a crowd trained machine learning model\",\"authors\":\"Joshua Saxe, R. Turner, K. Blokhin\",\"doi\":\"10.1109/MALWARE.2014.6999417\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we introduce CrowdSource, a statistical natural language processing system designed to make rapid inferences about malware functionality based on printable character strings extracted from malware binaries. CrowdSource “learns” a mapping between low-level language and high-level software functionality by leveraging millions of web technical documents from StackExchange, a popular network of technical question and answer sites, using this mapping to infer malware capabilities. This paper describes our approach and provides an evaluation of its accuracy and performance, demonstrating that it can detect at least 14 high-level malware capabilities in unpacked malware binaries with an average per-capability f-score of 0.86 and at a rate of tens of thousands of binaries per day on commodity hardware.\",\"PeriodicalId\":151942,\"journal\":{\"name\":\"2014 9th International Conference on Malicious and Unwanted Software: The Americas (MALWARE)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 9th International Conference on Malicious and Unwanted Software: The Americas (MALWARE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MALWARE.2014.6999417\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 9th International Conference on Malicious and Unwanted Software: The Americas (MALWARE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MALWARE.2014.6999417","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
CrowdSource: Automated inference of high level malware functionality from low-level symbols using a crowd trained machine learning model
In this paper we introduce CrowdSource, a statistical natural language processing system designed to make rapid inferences about malware functionality based on printable character strings extracted from malware binaries. CrowdSource “learns” a mapping between low-level language and high-level software functionality by leveraging millions of web technical documents from StackExchange, a popular network of technical question and answer sites, using this mapping to infer malware capabilities. This paper describes our approach and provides an evaluation of its accuracy and performance, demonstrating that it can detect at least 14 high-level malware capabilities in unpacked malware binaries with an average per-capability f-score of 0.86 and at a rate of tens of thousands of binaries per day on commodity hardware.