FLAG: Flow Representation Generator based on Self-supervised Learning for Encrypted Traffic Classification

5th Asia-Pacific Workshop on Networking (APNet 2021) Pub Date : 2021-06-24 DOI:10.1145/3469393.3469394

Wenting Wei, Tianjie Ju, Han Liao, Weike Zhao, Huaxi Gu

{"title":"FLAG: Flow Representation Generator based on Self-supervised Learning for Encrypted Traffic Classification","authors":"Wenting Wei, Tianjie Ju, Han Liao, Weike Zhao, Huaxi Gu","doi":"10.1145/3469393.3469394","DOIUrl":null,"url":null,"abstract":"Due to its excellent ability in learning features from large scale raw data, deep learning (DL) has attracted much attention for encrypted traffic classification. However, most DL-based traffic classifiers usually rely on enormous labeled samples. Motivated by this, we investigate a self-supervised traffic classifier (FLAG) without sacrifice of identification accuracy, only depending on small labeled traffic samples and highly available unlabeled traffic samples. Specifically, focusing on local short-term characteristics of traffic, we design a preprocessing algorithm, termed as N-phrase Extration, to convert unlabeled raw traffic dataset into sequences of high-frequency phrases as input of Bidirectional Encoder. On account of their significance, potential timing characteristics from input sequences are mined by Bidirectional Encoder and embedded into robust representations with distributed vectors to enhance classifier’s performance significantly. Our comprehensive experiments indicate FLAG can achieve 98.65% in 100% of dataset and 98.07% in 10% of dataset in terms of true positive rate in UNB ISCX VPN-nonVPN dataset, which are better than p-FP, FS-Net and Deep Packet.","PeriodicalId":291942,"journal":{"name":"5th Asia-Pacific Workshop on Networking (APNet 2021)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"5th Asia-Pacific Workshop on Networking (APNet 2021)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3469393.3469394","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Due to its excellent ability in learning features from large scale raw data, deep learning (DL) has attracted much attention for encrypted traffic classification. However, most DL-based traffic classifiers usually rely on enormous labeled samples. Motivated by this, we investigate a self-supervised traffic classifier (FLAG) without sacrifice of identification accuracy, only depending on small labeled traffic samples and highly available unlabeled traffic samples. Specifically, focusing on local short-term characteristics of traffic, we design a preprocessing algorithm, termed as N-phrase Extration, to convert unlabeled raw traffic dataset into sequences of high-frequency phrases as input of Bidirectional Encoder. On account of their significance, potential timing characteristics from input sequences are mined by Bidirectional Encoder and embedded into robust representations with distributed vectors to enhance classifier’s performance significantly. Our comprehensive experiments indicate FLAG can achieve 98.65% in 100% of dataset and 98.07% in 10% of dataset in terms of true positive rate in UNB ISCX VPN-nonVPN dataset, which are better than p-FP, FS-Net and Deep Packet.

查看原文本刊更多论文

FLAG:基于自监督学习的流量表示生成器

深度学习(deep learning, DL)由于具有从大规模原始数据中学习特征的优异能力，在加密流量分类中备受关注。然而，大多数基于dl的流量分类器通常依赖于大量的标记样本。基于此，我们研究了一种不牺牲识别精度的自监督流量分类器(FLAG)，仅依赖于小的标记流量样本和高可用的未标记流量样本。具体而言，针对交通的局部短期特征，我们设计了一种预处理算法，称为n短语提取，将未标记的原始交通数据集转换为高频短语序列作为双向编码器的输入。考虑到潜在的时序特征的重要性，双向编码器挖掘输入序列的潜在时序特征，并将其嵌入到具有分布式向量的鲁棒表示中，以显著提高分类器的性能。综合实验表明，在UNB ISCX vpn -非vpn数据集上，FLAG的真阳性率在100%的数据集上达到98.65%，在10%的数据集上达到98.07%，优于p-FP、FS-Net和Deep Packet。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

5th Asia-Pacific Workshop on Networking (APNet 2021)

自引率

0.00%

发文量