Performance evaluation of a machine learning algorithm for early application identification

G. Verticale, P. Giacomazzi
{"title":"Performance evaluation of a machine learning algorithm for early application identification","authors":"G. Verticale, P. Giacomazzi","doi":"10.1109/IMCSIT.2008.4747340","DOIUrl":null,"url":null,"abstract":"The early identification of applications through the observation and fast analysis of the associated packet flows is a critical building block of intrusion detection and policy enforcement systems. The simple techniques currently used in practice, such as looking at the transport port numbers or at the application payload, are increasingly less effective for new applications using random port numbers and/or encryption. Therefore, there is increasing interest in machine learning techniques capable of identifying applications by examining features of the associated traffic process such as packet lengths and inter-arrival times. However, these techniques require that the classification algorithm is trained with examples of the traffic generated by the applications to be identified, possibly on the link where the the classifier will operate. In this paper we provide two new contributions. First, we apply the C4.5 decision tree algorithm to the problem of early application identification (i.e. looking at the first packets of the flow) and show that it has better performance than the algorithms proposed in the literature. Moreover, we evaluate the performance of the classifier when training is performed on a link different from the link where the classifier operates. This is an important issue, as a pre-trained portable classifier would greatly facilitate the deployment and management of the classification infrastructure.","PeriodicalId":267715,"journal":{"name":"2008 International Multiconference on Computer Science and Information Technology","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International Multiconference on Computer Science and Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMCSIT.2008.4747340","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

The early identification of applications through the observation and fast analysis of the associated packet flows is a critical building block of intrusion detection and policy enforcement systems. The simple techniques currently used in practice, such as looking at the transport port numbers or at the application payload, are increasingly less effective for new applications using random port numbers and/or encryption. Therefore, there is increasing interest in machine learning techniques capable of identifying applications by examining features of the associated traffic process such as packet lengths and inter-arrival times. However, these techniques require that the classification algorithm is trained with examples of the traffic generated by the applications to be identified, possibly on the link where the the classifier will operate. In this paper we provide two new contributions. First, we apply the C4.5 decision tree algorithm to the problem of early application identification (i.e. looking at the first packets of the flow) and show that it has better performance than the algorithms proposed in the literature. Moreover, we evaluate the performance of the classifier when training is performed on a link different from the link where the classifier operates. This is an important issue, as a pre-trained portable classifier would greatly facilitate the deployment and management of the classification infrastructure.
用于早期应用识别的机器学习算法的性能评估
通过观察和快速分析相关数据包流来早期识别应用程序是入侵检测和策略实施系统的关键组成部分。目前在实践中使用的简单技术,例如查看传输端口号或应用程序有效负载,对于使用随机端口号和/或加密的新应用程序越来越不有效。因此,人们对能够通过检查相关流量过程的特征(如数据包长度和间隔到达时间)来识别应用程序的机器学习技术越来越感兴趣。然而,这些技术需要使用要识别的应用程序生成的流量示例来训练分类算法,这些示例可能是在分类器将运行的链接上。在本文中,我们提供了两个新的贡献。首先,我们将C4.5决策树算法应用于早期应用识别问题(即查看流的第一个数据包),并表明它比文献中提出的算法具有更好的性能。此外,当在与分类器运行的链接不同的链接上执行训练时,我们评估分类器的性能。这是一个重要的问题,因为预先训练的可移植分类器将极大地促进分类基础设施的部署和管理。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信