M. R. Oliveira, R. Valadas, M. Pietrzyk, D. Collange
{"title":"Stability of flow features for the identification of Internet applications","authors":"M. R. Oliveira, R. Valadas, M. Pietrzyk, D. Collange","doi":"10.1109/NETWKS.2014.6959223","DOIUrl":null,"url":null,"abstract":"One important requirement associated with the deployment of large scale classification infrastructures is the portability of classifiers, which allows a small number of pre-trained classifiers to be used on many sites and time periods. The portability can be severely degraded if the flow features used in the classification process lack stability, i.e. if they do not preserve their most relevant statistical properties across different sites and time periods. In this paper we propose a statistical procedure to evaluate the stability of flow features, which resorts to the notion of effect size. The procedure is used challenge the stability of popular flow features, such as the direction and size of the first four packets of a TCP connection. Our results, obtained with three high-quality traffic traces, clearly show that only some applications are portable, when using these features as discriminators. We also provide evidence of these findings based on the operation of the protocols underlying the Internet applications.","PeriodicalId":410892,"journal":{"name":"2014 16th International Telecommunications Network Strategy and Planning Symposium (Networks)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 16th International Telecommunications Network Strategy and Planning Symposium (Networks)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NETWKS.2014.6959223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
One important requirement associated with the deployment of large scale classification infrastructures is the portability of classifiers, which allows a small number of pre-trained classifiers to be used on many sites and time periods. The portability can be severely degraded if the flow features used in the classification process lack stability, i.e. if they do not preserve their most relevant statistical properties across different sites and time periods. In this paper we propose a statistical procedure to evaluate the stability of flow features, which resorts to the notion of effect size. The procedure is used challenge the stability of popular flow features, such as the direction and size of the first four packets of a TCP connection. Our results, obtained with three high-quality traffic traces, clearly show that only some applications are portable, when using these features as discriminators. We also provide evidence of these findings based on the operation of the protocols underlying the Internet applications.