AppClassNet: a commercial-grade dataset for application identification research: ACM SIGCOMM Computer Communication Review: Vol 52, No 3

IF 2.8 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Sigcomm Computer Communication Review Pub Date : 2022-09-06 DOI:https://dl.acm.org/doi/10.1145/3561954.3561958

Chao Wang, Alessandro Finamore, Lixuan Yang, Kevin Fauvel, Dario Rossi

{"title":"AppClassNet: a commercial-grade dataset for application identification research: ACM SIGCOMM Computer Communication Review: Vol 52, No 3","authors":"Chao Wang, Alessandro Finamore, Lixuan Yang, Kevin Fauvel, Dario Rossi","doi":"https://dl.acm.org/doi/10.1145/3561954.3561958","DOIUrl":null,"url":null,"abstract":"<p>The recent success of Artificial Intelligence (AI) is rooted into several concomitant factors, namely theoretical progress coupled with abundance of data and computing power. Large companies can take advantage of a deluge of data, typically withhold from the research community due to privacy or business sensitivity concerns, and this is particularly true for networking data. Therefore, the lack of high quality data is often recognized as one of the main factors currently limiting networking research from fully leveraging AI methodologies potential.</p><p>Following numerous requests we received from the scientific community, we release AppClassNet, a commercial-grade dataset for benchmarking traffic classification and management methodologies. AppClassNet is significantly larger than the datasets generally available to the academic community in terms of both the number of samples and classes, and reaches scales similar to the popular ImageNet dataset commonly used in computer vision literature. To avoid leaking user- and business-sensitive information, we opportunely anonymized the dataset, while empirically showing that it still represents a relevant benchmark for algorithmic research. In this paper, we describe the public dataset and our anonymization process. We hope that AppClassNet can be instrumental for other researchers to address more complex commercial-grade problems in the broad field of traffic classification and management.</p>","PeriodicalId":50646,"journal":{"name":"ACM Sigcomm Computer Communication Review","volume":"72 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2022-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Sigcomm Computer Communication Review","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3561954.3561958","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The recent success of Artificial Intelligence (AI) is rooted into several concomitant factors, namely theoretical progress coupled with abundance of data and computing power. Large companies can take advantage of a deluge of data, typically withhold from the research community due to privacy or business sensitivity concerns, and this is particularly true for networking data. Therefore, the lack of high quality data is often recognized as one of the main factors currently limiting networking research from fully leveraging AI methodologies potential.

Following numerous requests we received from the scientific community, we release AppClassNet, a commercial-grade dataset for benchmarking traffic classification and management methodologies. AppClassNet is significantly larger than the datasets generally available to the academic community in terms of both the number of samples and classes, and reaches scales similar to the popular ImageNet dataset commonly used in computer vision literature. To avoid leaking user- and business-sensitive information, we opportunely anonymized the dataset, while empirically showing that it still represents a relevant benchmark for algorithmic research. In this paper, we describe the public dataset and our anonymization process. We hope that AppClassNet can be instrumental for other researchers to address more complex commercial-grade problems in the broad field of traffic classification and management.

查看原文本刊更多论文

AppClassNet:用于应用识别研究的商业级数据集:ACM SIGCOMM计算机通信评论:Vol 52, No 3

人工智能(AI)最近的成功植根于几个相关因素，即理论进步加上丰富的数据和计算能力。大公司可以利用大量的数据，通常由于隐私或商业敏感性问题而对研究界保密，对于网络数据尤其如此。因此，缺乏高质量的数据通常被认为是目前限制网络研究充分利用人工智能方法潜力的主要因素之一。在收到科学界的大量请求后，我们发布了AppClassNet，这是一个商业级数据集，用于对流量分类和管理方法进行基准测试。AppClassNet在样本数量和类别数量上都明显大于学术界通常可用的数据集，并且达到了与计算机视觉文献中常用的流行ImageNet数据集相似的规模。为了避免泄露用户和业务敏感信息，我们适时地对数据集进行了匿名化，同时从经验上表明，它仍然代表了算法研究的相关基准。在本文中，我们描述了公共数据集和我们的匿名化过程。我们希望AppClassNet可以帮助其他研究人员在广泛的流量分类和管理领域解决更复杂的商业级问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Sigcomm Computer Communication Review 工程技术-计算机：信息系统

CiteScore

6.90

自引率

3.60%

发文量

审稿时长

4-8 weeks

期刊介绍： Computer Communication Review (CCR) is an online publication of the ACM Special Interest Group on Data Communication (SIGCOMM) and publishes articles on topics within the SIG''s field of interest. Technical papers accepted to CCR typically report on practical advances or the practical applications of theoretical advances. CCR serves as a forum for interesting and novel ideas at an early stage in their development. The focus is on timely dissemination of new ideas that may help trigger additional investigations. While the innovation and timeliness are the major criteria for its acceptance, technical robustness and readability will also be considered in the review process. We particularly encourage papers with early evaluation or feasibility studies.