Prakash Mandayam Comar, Lei Liu, Sabyasachi Saha, A. Nucci, P. Tan
{"title":"基于树变换特征的加权线性核","authors":"Prakash Mandayam Comar, Lei Liu, Sabyasachi Saha, A. Nucci, P. Tan","doi":"10.1145/2396761.2398622","DOIUrl":null,"url":null,"abstract":"Malware detection from network traffic flows is a challenging problem due to data irregularity issues such as imbalanced class distribution, noise, missing values, and heterogeneous types of features. To address these challenges, this paper presents a two-stage classification approach for malware detection. The framework initially employs random forest as a macro-level classifier to separate the malicious from non-malicious network flows, followed by a collection of one-class support vector machine classifiers to identify the specific type of malware. A novel tree-based feature construction approach is proposed to deal with data imperfection issues. As the performance of the support vector machine classifier often depends on the kernel function used to compute the similarity between every pair of data points, designing an appropriate kernel is essential for accurate identification of malware classes. We present a simple algorithm to construct a weighted linear kernel on the tree transformed features and demonstrate its effectiveness in detecting malware from real network traffic data.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Weighted linear kernel with tree transformed features for malware detection\",\"authors\":\"Prakash Mandayam Comar, Lei Liu, Sabyasachi Saha, A. Nucci, P. Tan\",\"doi\":\"10.1145/2396761.2398622\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Malware detection from network traffic flows is a challenging problem due to data irregularity issues such as imbalanced class distribution, noise, missing values, and heterogeneous types of features. To address these challenges, this paper presents a two-stage classification approach for malware detection. The framework initially employs random forest as a macro-level classifier to separate the malicious from non-malicious network flows, followed by a collection of one-class support vector machine classifiers to identify the specific type of malware. A novel tree-based feature construction approach is proposed to deal with data imperfection issues. As the performance of the support vector machine classifier often depends on the kernel function used to compute the similarity between every pair of data points, designing an appropriate kernel is essential for accurate identification of malware classes. We present a simple algorithm to construct a weighted linear kernel on the tree transformed features and demonstrate its effectiveness in detecting malware from real network traffic data.\",\"PeriodicalId\":313414,\"journal\":{\"name\":\"Proceedings of the 21st ACM international conference on Information and knowledge management\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 21st ACM international conference on Information and knowledge management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2396761.2398622\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st ACM international conference on Information and knowledge management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2396761.2398622","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Weighted linear kernel with tree transformed features for malware detection
Malware detection from network traffic flows is a challenging problem due to data irregularity issues such as imbalanced class distribution, noise, missing values, and heterogeneous types of features. To address these challenges, this paper presents a two-stage classification approach for malware detection. The framework initially employs random forest as a macro-level classifier to separate the malicious from non-malicious network flows, followed by a collection of one-class support vector machine classifiers to identify the specific type of malware. A novel tree-based feature construction approach is proposed to deal with data imperfection issues. As the performance of the support vector machine classifier often depends on the kernel function used to compute the similarity between every pair of data points, designing an appropriate kernel is essential for accurate identification of malware classes. We present a simple algorithm to construct a weighted linear kernel on the tree transformed features and demonstrate its effectiveness in detecting malware from real network traffic data.