Efficient Distribution-Derived Features for High-Speed Encrypted Flow Classification

Johan Garcia, Topi Korhonen
{"title":"Efficient Distribution-Derived Features for High-Speed Encrypted Flow Classification","authors":"Johan Garcia, Topi Korhonen","doi":"10.1145/3229543.3229548","DOIUrl":null,"url":null,"abstract":"Flow classification is an important tool to enable efficient network resource usage, support traffic engineering, and aid QoS mechanisms. As traffic is increasingly becoming encrypted by default, flow classification is turning towards the use of machine learning methods employing features that are also available for encrypted traffic. In this work we evaluate flow features that capture the distributional properties of in-flow per-packet metrics such as packet size and inter-arrival time. The characteristics of such distributions are often captured with general statistical measures such as standard deviation, variance, etc. We instead propose a Kolmogorov-Smirnov discretization (KSD) algorithm to perform histogram bin construction based on the distributional properties observed in the data. This allows for a richer, histogram based, representation which also requires less resources for feature computation than higher order statistical moments. A comprehensive evaluation using synthetic data from Gaussian and Beta mixtures show that the KSD approach provides Jensen-Shannon distance results surpassing those of uniform binning and probabilistic binning. An empirical evaluation using live traffic traces from a cellular network further shows that when coupled with a random forest classifier the KSD-constructed features improve classification performance compared to general statistical features based on higher order moments, or alternative bin placement approaches.","PeriodicalId":198478,"journal":{"name":"Proceedings of the 2018 Workshop on Network Meets AI & ML","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 Workshop on Network Meets AI & ML","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3229543.3229548","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Flow classification is an important tool to enable efficient network resource usage, support traffic engineering, and aid QoS mechanisms. As traffic is increasingly becoming encrypted by default, flow classification is turning towards the use of machine learning methods employing features that are also available for encrypted traffic. In this work we evaluate flow features that capture the distributional properties of in-flow per-packet metrics such as packet size and inter-arrival time. The characteristics of such distributions are often captured with general statistical measures such as standard deviation, variance, etc. We instead propose a Kolmogorov-Smirnov discretization (KSD) algorithm to perform histogram bin construction based on the distributional properties observed in the data. This allows for a richer, histogram based, representation which also requires less resources for feature computation than higher order statistical moments. A comprehensive evaluation using synthetic data from Gaussian and Beta mixtures show that the KSD approach provides Jensen-Shannon distance results surpassing those of uniform binning and probabilistic binning. An empirical evaluation using live traffic traces from a cellular network further shows that when coupled with a random forest classifier the KSD-constructed features improve classification performance compared to general statistical features based on higher order moments, or alternative bin placement approaches.
高速加密流分类的高效分布衍生特征
流分类是有效利用网络资源、支持流量工程和辅助QoS机制的重要工具。随着流量越来越多地被默认加密,流分类正转向使用机器学习方法,这些方法采用的特征也可用于加密流量。在这项工作中,我们评估了捕获流中每个数据包度量(如数据包大小和间隔到达时间)的分布属性的流特征。这种分布的特征通常是用标准偏差、方差等一般统计度量来捕捉的。我们提出了一种Kolmogorov-Smirnov离散化(KSD)算法,根据数据中观察到的分布特性来执行直方图bin构造。这允许更丰富的、基于直方图的表示,这也比高阶统计矩需要更少的特征计算资源。利用高斯混合和β混合的综合数据进行综合评价,表明KSD方法提供的Jensen-Shannon距离结果优于均匀分箱和概率分箱。使用来自蜂窝网络的实时流量轨迹的经验评估进一步表明,与基于高阶矩或替代bin放置方法的一般统计特征相比,与随机森林分类器相结合时,ksd构建的特征提高了分类性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信