{"title":"基于深度学习的流行恶意软件家族检测","authors":"J. W. Stokes, C. Seifert, Jerry Li, Nizar Hejazi","doi":"10.1109/MILCOM47813.2019.9020790","DOIUrl":null,"url":null,"abstract":"Attackers evolve their malware over time in order to evade detection, and the rate of change varies from family to family depending on the amount of resources these groups devote to their “product”. This rapid change forces anti-malware companies to also direct much human and automated effort towards combatting these threats. These companies track thousands of distinct malware families and their variants, but the most prevalent families are often particularly problematic. While some companies employ many analysts to investigate and create new signatures for these highly prevalent families, we take a different approach and propose a new deep learning system to learn a semantic feature embedding which better discriminates the files within each of these families. Identifying files which are close in a metric space is the key aspect of malware clustering systems. The DeepSim system employs a Siamese Neural Network (SNN), which has previously shown promising results in other domains, to learn this embedding for the cosine distance in the feature space. The error rate for K-Nearest Neighbor classification using DeepSim's SNN with two hidden layers is 0.011% compared to 0.42% for a Jaccard Index-based baseline which has been used by several previously proposed systems to identify similar malware files.","PeriodicalId":371812,"journal":{"name":"MILCOM 2019 - 2019 IEEE Military Communications Conference (MILCOM)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Detection of Prevalent Malware Families with Deep Learning\",\"authors\":\"J. W. Stokes, C. Seifert, Jerry Li, Nizar Hejazi\",\"doi\":\"10.1109/MILCOM47813.2019.9020790\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Attackers evolve their malware over time in order to evade detection, and the rate of change varies from family to family depending on the amount of resources these groups devote to their “product”. This rapid change forces anti-malware companies to also direct much human and automated effort towards combatting these threats. These companies track thousands of distinct malware families and their variants, but the most prevalent families are often particularly problematic. While some companies employ many analysts to investigate and create new signatures for these highly prevalent families, we take a different approach and propose a new deep learning system to learn a semantic feature embedding which better discriminates the files within each of these families. Identifying files which are close in a metric space is the key aspect of malware clustering systems. The DeepSim system employs a Siamese Neural Network (SNN), which has previously shown promising results in other domains, to learn this embedding for the cosine distance in the feature space. The error rate for K-Nearest Neighbor classification using DeepSim's SNN with two hidden layers is 0.011% compared to 0.42% for a Jaccard Index-based baseline which has been used by several previously proposed systems to identify similar malware files.\",\"PeriodicalId\":371812,\"journal\":{\"name\":\"MILCOM 2019 - 2019 IEEE Military Communications Conference (MILCOM)\",\"volume\":\"79 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"MILCOM 2019 - 2019 IEEE Military Communications Conference (MILCOM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MILCOM47813.2019.9020790\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"MILCOM 2019 - 2019 IEEE Military Communications Conference (MILCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MILCOM47813.2019.9020790","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Detection of Prevalent Malware Families with Deep Learning
Attackers evolve their malware over time in order to evade detection, and the rate of change varies from family to family depending on the amount of resources these groups devote to their “product”. This rapid change forces anti-malware companies to also direct much human and automated effort towards combatting these threats. These companies track thousands of distinct malware families and their variants, but the most prevalent families are often particularly problematic. While some companies employ many analysts to investigate and create new signatures for these highly prevalent families, we take a different approach and propose a new deep learning system to learn a semantic feature embedding which better discriminates the files within each of these families. Identifying files which are close in a metric space is the key aspect of malware clustering systems. The DeepSim system employs a Siamese Neural Network (SNN), which has previously shown promising results in other domains, to learn this embedding for the cosine distance in the feature space. The error rate for K-Nearest Neighbor classification using DeepSim's SNN with two hidden layers is 0.011% compared to 0.42% for a Jaccard Index-based baseline which has been used by several previously proposed systems to identify similar malware files.