Performance evaluation of a machine learning algorithm for early application identification

2008 International Multiconference on Computer Science and Information Technology Pub Date : 2008-10-01 DOI:10.1109/IMCSIT.2008.4747340

G. Verticale, P. Giacomazzi

{"title":"Performance evaluation of a machine learning algorithm for early application identification","authors":"G. Verticale, P. Giacomazzi","doi":"10.1109/IMCSIT.2008.4747340","DOIUrl":null,"url":null,"abstract":"The early identification of applications through the observation and fast analysis of the associated packet flows is a critical building block of intrusion detection and policy enforcement systems. The simple techniques currently used in practice, such as looking at the transport port numbers or at the application payload, are increasingly less effective for new applications using random port numbers and/or encryption. Therefore, there is increasing interest in machine learning techniques capable of identifying applications by examining features of the associated traffic process such as packet lengths and inter-arrival times. However, these techniques require that the classification algorithm is trained with examples of the traffic generated by the applications to be identified, possibly on the link where the the classifier will operate. In this paper we provide two new contributions. First, we apply the C4.5 decision tree algorithm to the problem of early application identification (i.e. looking at the first packets of the flow) and show that it has better performance than the algorithms proposed in the literature. Moreover, we evaluate the performance of the classifier when training is performed on a link different from the link where the classifier operates. This is an important issue, as a pre-trained portable classifier would greatly facilitate the deployment and management of the classification infrastructure.","PeriodicalId":267715,"journal":{"name":"2008 International Multiconference on Computer Science and Information Technology","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International Multiconference on Computer Science and Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMCSIT.2008.4747340","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

The early identification of applications through the observation and fast analysis of the associated packet flows is a critical building block of intrusion detection and policy enforcement systems. The simple techniques currently used in practice, such as looking at the transport port numbers or at the application payload, are increasingly less effective for new applications using random port numbers and/or encryption. Therefore, there is increasing interest in machine learning techniques capable of identifying applications by examining features of the associated traffic process such as packet lengths and inter-arrival times. However, these techniques require that the classification algorithm is trained with examples of the traffic generated by the applications to be identified, possibly on the link where the the classifier will operate. In this paper we provide two new contributions. First, we apply the C4.5 decision tree algorithm to the problem of early application identification (i.e. looking at the first packets of the flow) and show that it has better performance than the algorithms proposed in the literature. Moreover, we evaluate the performance of the classifier when training is performed on a link different from the link where the classifier operates. This is an important issue, as a pre-trained portable classifier would greatly facilitate the deployment and management of the classification infrastructure.

查看原文本刊更多论文

用于早期应用识别的机器学习算法的性能评估

通过观察和快速分析相关数据包流来早期识别应用程序是入侵检测和策略实施系统的关键组成部分。目前在实践中使用的简单技术，例如查看传输端口号或应用程序有效负载，对于使用随机端口号和/或加密的新应用程序越来越不有效。因此，人们对能够通过检查相关流量过程的特征(如数据包长度和间隔到达时间)来识别应用程序的机器学习技术越来越感兴趣。然而，这些技术需要使用要识别的应用程序生成的流量示例来训练分类算法，这些示例可能是在分类器将运行的链接上。在本文中，我们提供了两个新的贡献。首先，我们将C4.5决策树算法应用于早期应用识别问题(即查看流的第一个数据包)，并表明它比文献中提出的算法具有更好的性能。此外，当在与分类器运行的链接不同的链接上执行训练时，我们评估分类器的性能。这是一个重要的问题，因为预先训练的可移植分类器将极大地促进分类基础设施的部署和管理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 International Multiconference on Computer Science and Information Technology

自引率

0.00%

发文量