{"title":"应用聚类技术提炼大数据集:以恶意软件为例","authors":"Yoon Myet Thwe, Mizuhito Ogawa, P. N. Dung","doi":"10.1109/AITC.2019.8921088","DOIUrl":null,"url":null,"abstract":"Malware databases have been unintentionally collecting garbage (incomplete malware) together with malware through the Internet. This paper focuses on finding garbage (incomplete malware) from large malware datasets using binary pattern matching and speed up the matching by using nested clustering as a preprocessing. To verify the effectiveness of our method, we conduct experiments on various malware datasets. The results show that our method works efficiently while maintaining high accuracy.","PeriodicalId":388642,"journal":{"name":"2019 International Conference on Advanced Information Technologies (ICAIT)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Applying Clustering Techniques for Refining Large Data Set: Case Study on Malware\",\"authors\":\"Yoon Myet Thwe, Mizuhito Ogawa, P. N. Dung\",\"doi\":\"10.1109/AITC.2019.8921088\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Malware databases have been unintentionally collecting garbage (incomplete malware) together with malware through the Internet. This paper focuses on finding garbage (incomplete malware) from large malware datasets using binary pattern matching and speed up the matching by using nested clustering as a preprocessing. To verify the effectiveness of our method, we conduct experiments on various malware datasets. The results show that our method works efficiently while maintaining high accuracy.\",\"PeriodicalId\":388642,\"journal\":{\"name\":\"2019 International Conference on Advanced Information Technologies (ICAIT)\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Advanced Information Technologies (ICAIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AITC.2019.8921088\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Advanced Information Technologies (ICAIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AITC.2019.8921088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Applying Clustering Techniques for Refining Large Data Set: Case Study on Malware
Malware databases have been unintentionally collecting garbage (incomplete malware) together with malware through the Internet. This paper focuses on finding garbage (incomplete malware) from large malware datasets using binary pattern matching and speed up the matching by using nested clustering as a preprocessing. To verify the effectiveness of our method, we conduct experiments on various malware datasets. The results show that our method works efficiently while maintaining high accuracy.