网络流量领域数据集质量评价研究

2021 17th International Conference on Network and Service Management (CNSM) Pub Date : 2021-10-25 DOI:10.23919/CNSM52442.2021.9615601

Dominik Soukup, Peter Tisovčík, Karel Hynek, T. Čejka

{"title":"网络流量领域数据集质量评价研究","authors":"Dominik Soukup, Peter Tisovčík, Karel Hynek, T. Čejka","doi":"10.23919/CNSM52442.2021.9615601","DOIUrl":null,"url":null,"abstract":"This paper deals with the quality of network traffic datasets created to train and validate machine learning classification and detection methods. Naturally, there is a long epoch of research targeted at data quality; however, it is focused mainly on data consistency, validity, precision, and other metrics, which are insufficient for network traffic use-cases. The rise of Machine learning usage in network monitoring applications requires a new methodology for evaluation datasets. There is a need to evaluate and compare traffic samples captured at different conditions and decide the usability of the already captured and annotated data. This paper aims to explain a use case of dataset creation, propose definitions regarding the quality of the network traffic datasets, and finally, describe a framework for datasets analysis.","PeriodicalId":358223,"journal":{"name":"2021 17th International Conference on Network and Service Management (CNSM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Towards Evaluating Quality of Datasets for Network Traffic Domain\",\"authors\":\"Dominik Soukup, Peter Tisovčík, Karel Hynek, T. Čejka\",\"doi\":\"10.23919/CNSM52442.2021.9615601\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper deals with the quality of network traffic datasets created to train and validate machine learning classification and detection methods. Naturally, there is a long epoch of research targeted at data quality; however, it is focused mainly on data consistency, validity, precision, and other metrics, which are insufficient for network traffic use-cases. The rise of Machine learning usage in network monitoring applications requires a new methodology for evaluation datasets. There is a need to evaluate and compare traffic samples captured at different conditions and decide the usability of the already captured and annotated data. This paper aims to explain a use case of dataset creation, propose definitions regarding the quality of the network traffic datasets, and finally, describe a framework for datasets analysis.\",\"PeriodicalId\":358223,\"journal\":{\"name\":\"2021 17th International Conference on Network and Service Management (CNSM)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 17th International Conference on Network and Service Management (CNSM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/CNSM52442.2021.9615601\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 17th International Conference on Network and Service Management (CNSM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/CNSM52442.2021.9615601","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

本文讨论了为训练和验证机器学习分类和检测方法而创建的网络流量数据集的质量。当然，针对数据质量的研究已经有很长一段时间了;然而，它主要关注数据一致性、有效性、精度和其他指标，这些指标对于网络流量用例来说是不够的。机器学习在网络监控应用中的应用需要一种新的评估数据集的方法。需要评估和比较在不同条件下捕获的流量样本，并决定已捕获和注释数据的可用性。本文旨在解释数据集创建的用例，提出有关网络流量数据集质量的定义，最后描述数据集分析的框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards Evaluating Quality of Datasets for Network Traffic Domain

This paper deals with the quality of network traffic datasets created to train and validate machine learning classification and detection methods. Naturally, there is a long epoch of research targeted at data quality; however, it is focused mainly on data consistency, validity, precision, and other metrics, which are insufficient for network traffic use-cases. The rise of Machine learning usage in network monitoring applications requires a new methodology for evaluation datasets. There is a need to evaluate and compare traffic samples captured at different conditions and decide the usability of the already captured and annotated data. This paper aims to explain a use case of dataset creation, propose definitions regarding the quality of the network traffic datasets, and finally, describe a framework for datasets analysis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 17th International Conference on Network and Service Management (CNSM)

自引率

0.00%

发文量