{"title":"存在概念漂移的部分标记恶意Web流量分类","authors":"Goce Anastasovski, K. Goseva-Popstojanova","doi":"10.1109/SERE-C.2014.31","DOIUrl":null,"url":null,"abstract":"Attacks to Web systems have shown an increasing trend in the recent past. A contributing factor to this trend is the deployment of Web 2.0 technologies. While work related to characterization and classification of malicious Web traffic using supervised learning exists, little work has been done using semi-supervised learning with partially labeled data. In this paper an incremental semi-supervised algorithm (CSL-Stream) is used to classify malicious Web traffic to multiple classes, as well as to analyze the concept drift and concept evolution phenomena. The work is based on data collected in duration of nine months by a high-interaction honeypot running Web 2.0 applications. The results showed that on completely labeled data semi-supervised learning performed only slightly worse than the supervised learning algorithm. More importantly, multiclass classification of the partially labeled malicious Web traffic (i.e., 50% or 25% labeled sessions) was almost as good as the classification of completely labeled data.","PeriodicalId":373062,"journal":{"name":"2014 IEEE Eighth International Conference on Software Security and Reliability-Companion","volume":"467 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Classification of Partially Labeled Malicious Web Traffic in the Presence of Concept Drift\",\"authors\":\"Goce Anastasovski, K. Goseva-Popstojanova\",\"doi\":\"10.1109/SERE-C.2014.31\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Attacks to Web systems have shown an increasing trend in the recent past. A contributing factor to this trend is the deployment of Web 2.0 technologies. While work related to characterization and classification of malicious Web traffic using supervised learning exists, little work has been done using semi-supervised learning with partially labeled data. In this paper an incremental semi-supervised algorithm (CSL-Stream) is used to classify malicious Web traffic to multiple classes, as well as to analyze the concept drift and concept evolution phenomena. The work is based on data collected in duration of nine months by a high-interaction honeypot running Web 2.0 applications. The results showed that on completely labeled data semi-supervised learning performed only slightly worse than the supervised learning algorithm. More importantly, multiclass classification of the partially labeled malicious Web traffic (i.e., 50% or 25% labeled sessions) was almost as good as the classification of completely labeled data.\",\"PeriodicalId\":373062,\"journal\":{\"name\":\"2014 IEEE Eighth International Conference on Software Security and Reliability-Companion\",\"volume\":\"467 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE Eighth International Conference on Software Security and Reliability-Companion\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SERE-C.2014.31\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE Eighth International Conference on Software Security and Reliability-Companion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SERE-C.2014.31","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Classification of Partially Labeled Malicious Web Traffic in the Presence of Concept Drift
Attacks to Web systems have shown an increasing trend in the recent past. A contributing factor to this trend is the deployment of Web 2.0 technologies. While work related to characterization and classification of malicious Web traffic using supervised learning exists, little work has been done using semi-supervised learning with partially labeled data. In this paper an incremental semi-supervised algorithm (CSL-Stream) is used to classify malicious Web traffic to multiple classes, as well as to analyze the concept drift and concept evolution phenomena. The work is based on data collected in duration of nine months by a high-interaction honeypot running Web 2.0 applications. The results showed that on completely labeled data semi-supervised learning performed only slightly worse than the supervised learning algorithm. More importantly, multiclass classification of the partially labeled malicious Web traffic (i.e., 50% or 25% labeled sessions) was almost as good as the classification of completely labeled data.