S. Ponomarev, Jan Durand, Nathan Wallace, T. Atkison
{"title":"随机投影对恶意软件分类的评价","authors":"S. Ponomarev, Jan Durand, Nathan Wallace, T. Atkison","doi":"10.1109/SERE-C.2013.29","DOIUrl":null,"url":null,"abstract":"Research efforts to develop malicious application detection algorithms have been a priority ever since the discovery of the first \"viruses\". Various methods are used to search and identify these malicious applications. One such method, n-gram analysis, can be implemented to extract features from binary files. These features are then be used by machine learning algorithms to classify them as malicious or benign. However, the resulting high dimensionality of the features makes accurate detection in some cases impossible. This is known as \"the curse of dimensionality\". To counteract this effect, a feature reduction technique known as randomized projection was implemented. Through this reduction, not only are classification times decreased but also an increase in true positive and decreases false positive rates are observed. By varying the n-gram size and target feature size it is possible to fine-tune the accuracy of machine learning algorithms to reach an average accuracy of 99%.","PeriodicalId":150535,"journal":{"name":"2013 IEEE Seventh International Conference on Software Security and Reliability Companion","volume":"102 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Evaluation of Random Projection for Malware Classification\",\"authors\":\"S. Ponomarev, Jan Durand, Nathan Wallace, T. Atkison\",\"doi\":\"10.1109/SERE-C.2013.29\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Research efforts to develop malicious application detection algorithms have been a priority ever since the discovery of the first \\\"viruses\\\". Various methods are used to search and identify these malicious applications. One such method, n-gram analysis, can be implemented to extract features from binary files. These features are then be used by machine learning algorithms to classify them as malicious or benign. However, the resulting high dimensionality of the features makes accurate detection in some cases impossible. This is known as \\\"the curse of dimensionality\\\". To counteract this effect, a feature reduction technique known as randomized projection was implemented. Through this reduction, not only are classification times decreased but also an increase in true positive and decreases false positive rates are observed. By varying the n-gram size and target feature size it is possible to fine-tune the accuracy of machine learning algorithms to reach an average accuracy of 99%.\",\"PeriodicalId\":150535,\"journal\":{\"name\":\"2013 IEEE Seventh International Conference on Software Security and Reliability Companion\",\"volume\":\"102 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE Seventh International Conference on Software Security and Reliability Companion\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SERE-C.2013.29\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Seventh International Conference on Software Security and Reliability Companion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SERE-C.2013.29","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Evaluation of Random Projection for Malware Classification
Research efforts to develop malicious application detection algorithms have been a priority ever since the discovery of the first "viruses". Various methods are used to search and identify these malicious applications. One such method, n-gram analysis, can be implemented to extract features from binary files. These features are then be used by machine learning algorithms to classify them as malicious or benign. However, the resulting high dimensionality of the features makes accurate detection in some cases impossible. This is known as "the curse of dimensionality". To counteract this effect, a feature reduction technique known as randomized projection was implemented. Through this reduction, not only are classification times decreased but also an increase in true positive and decreases false positive rates are observed. By varying the n-gram size and target feature size it is possible to fine-tune the accuracy of machine learning algorithms to reach an average accuracy of 99%.