{"title":"ByteFreq:使用字节频率的恶意软件集群","authors":"Nirmal Singh, S. S. Khurmi","doi":"10.1109/ICRITO.2016.7784976","DOIUrl":null,"url":null,"abstract":"Increased number of malware samples have created many challenges for Antivirus companies. One of these challenges is clustering the large number of malware samples they receive daily. Malware authors use malware generation kits to create different instances of the same malware. So most of these malicious samples are polymorphic instances of previously known malware family only. Clustering these large number of samples rapidly and accurately without spending much time on processing the sample have become a critical requirement. In this paper we proposed, implemented and evaluated a method, called ByteFreq that can cluster large number of samples using byte frequency. Byte frequency is represented as time series and SAX (Symbolic Aggregation approXimation)[1] is used to convert the time series in symbolic representation. We evaluated proposed system on real world malware samples and achieved 0.92 precision and 0.96 recall accuracy.","PeriodicalId":377611,"journal":{"name":"2016 5th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"ByteFreq: Malware clustering using byte frequency\",\"authors\":\"Nirmal Singh, S. S. Khurmi\",\"doi\":\"10.1109/ICRITO.2016.7784976\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Increased number of malware samples have created many challenges for Antivirus companies. One of these challenges is clustering the large number of malware samples they receive daily. Malware authors use malware generation kits to create different instances of the same malware. So most of these malicious samples are polymorphic instances of previously known malware family only. Clustering these large number of samples rapidly and accurately without spending much time on processing the sample have become a critical requirement. In this paper we proposed, implemented and evaluated a method, called ByteFreq that can cluster large number of samples using byte frequency. Byte frequency is represented as time series and SAX (Symbolic Aggregation approXimation)[1] is used to convert the time series in symbolic representation. We evaluated proposed system on real world malware samples and achieved 0.92 precision and 0.96 recall accuracy.\",\"PeriodicalId\":377611,\"journal\":{\"name\":\"2016 5th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO)\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 5th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICRITO.2016.7784976\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 5th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRITO.2016.7784976","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Increased number of malware samples have created many challenges for Antivirus companies. One of these challenges is clustering the large number of malware samples they receive daily. Malware authors use malware generation kits to create different instances of the same malware. So most of these malicious samples are polymorphic instances of previously known malware family only. Clustering these large number of samples rapidly and accurately without spending much time on processing the sample have become a critical requirement. In this paper we proposed, implemented and evaluated a method, called ByteFreq that can cluster large number of samples using byte frequency. Byte frequency is represented as time series and SAX (Symbolic Aggregation approXimation)[1] is used to convert the time series in symbolic representation. We evaluated proposed system on real world malware samples and achieved 0.92 precision and 0.96 recall accuracy.