Wenting Wei, Tianjie Ju, Han Liao, Weike Zhao, Huaxi Gu
{"title":"FLAG: Flow Representation Generator based on Self-supervised Learning for Encrypted Traffic Classification","authors":"Wenting Wei, Tianjie Ju, Han Liao, Weike Zhao, Huaxi Gu","doi":"10.1145/3469393.3469394","DOIUrl":null,"url":null,"abstract":"Due to its excellent ability in learning features from large scale raw data, deep learning (DL) has attracted much attention for encrypted traffic classification. However, most DL-based traffic classifiers usually rely on enormous labeled samples. Motivated by this, we investigate a self-supervised traffic classifier (FLAG) without sacrifice of identification accuracy, only depending on small labeled traffic samples and highly available unlabeled traffic samples. Specifically, focusing on local short-term characteristics of traffic, we design a preprocessing algorithm, termed as N-phrase Extration, to convert unlabeled raw traffic dataset into sequences of high-frequency phrases as input of Bidirectional Encoder. On account of their significance, potential timing characteristics from input sequences are mined by Bidirectional Encoder and embedded into robust representations with distributed vectors to enhance classifier’s performance significantly. Our comprehensive experiments indicate FLAG can achieve 98.65% in 100% of dataset and 98.07% in 10% of dataset in terms of true positive rate in UNB ISCX VPN-nonVPN dataset, which are better than p-FP, FS-Net and Deep Packet.","PeriodicalId":291942,"journal":{"name":"5th Asia-Pacific Workshop on Networking (APNet 2021)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"5th Asia-Pacific Workshop on Networking (APNet 2021)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3469393.3469394","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Due to its excellent ability in learning features from large scale raw data, deep learning (DL) has attracted much attention for encrypted traffic classification. However, most DL-based traffic classifiers usually rely on enormous labeled samples. Motivated by this, we investigate a self-supervised traffic classifier (FLAG) without sacrifice of identification accuracy, only depending on small labeled traffic samples and highly available unlabeled traffic samples. Specifically, focusing on local short-term characteristics of traffic, we design a preprocessing algorithm, termed as N-phrase Extration, to convert unlabeled raw traffic dataset into sequences of high-frequency phrases as input of Bidirectional Encoder. On account of their significance, potential timing characteristics from input sequences are mined by Bidirectional Encoder and embedded into robust representations with distributed vectors to enhance classifier’s performance significantly. Our comprehensive experiments indicate FLAG can achieve 98.65% in 100% of dataset and 98.07% in 10% of dataset in terms of true positive rate in UNB ISCX VPN-nonVPN dataset, which are better than p-FP, FS-Net and Deep Packet.