{"title":"Towards Unknown Traffic Identification via Embeddings and Deep Autoencoders","authors":"Shuyuan Zhao, Yongzheng Zhang, Yafei Sang","doi":"10.1109/ICT.2019.8798803","DOIUrl":null,"url":null,"abstract":"Traffic classification, as a fundamental tool for network management and security, is suffering from a critical problem, namely “unknown traffic”. The unknown traffic is defined as network traffic generated by previously unknown applications (i.e., zero-day applications) in a traffic classification system. The ability to divide the mixed unknown traffic into clusters, each of which contains only one application traffic as far as possible, is the key to solve this problem. This paper reports our recent exploration of the n-gram embeddings strategy, deep neural networks and clustering algorithms for constructing an unsupervised scheme for unknown network traffic identification. Experimental results on real-world traces demonstrate that our method gains average clustering purity rate about 97.35% when we use DNS, DHCP, BitTorrent, SSH, HTTP, IMAP, MySQL, and Github to simulate unknown traffic.","PeriodicalId":127412,"journal":{"name":"2019 26th International Conference on Telecommunications (ICT)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 26th International Conference on Telecommunications (ICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICT.2019.8798803","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
Traffic classification, as a fundamental tool for network management and security, is suffering from a critical problem, namely “unknown traffic”. The unknown traffic is defined as network traffic generated by previously unknown applications (i.e., zero-day applications) in a traffic classification system. The ability to divide the mixed unknown traffic into clusters, each of which contains only one application traffic as far as possible, is the key to solve this problem. This paper reports our recent exploration of the n-gram embeddings strategy, deep neural networks and clustering algorithms for constructing an unsupervised scheme for unknown network traffic identification. Experimental results on real-world traces demonstrate that our method gains average clustering purity rate about 97.35% when we use DNS, DHCP, BitTorrent, SSH, HTTP, IMAP, MySQL, and Github to simulate unknown traffic.