AppClassNet: a commercial-grade dataset for application identification research: ACM SIGCOMM Computer Communication Review: Vol 52, No 3

IF 2.2 4区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Chao Wang, Alessandro Finamore, Lixuan Yang, Kevin Fauvel, Dario Rossi
{"title":"AppClassNet: a commercial-grade dataset for application identification research: ACM SIGCOMM Computer Communication Review: Vol 52, No 3","authors":"Chao Wang, Alessandro Finamore, Lixuan Yang, Kevin Fauvel, Dario Rossi","doi":"https://dl.acm.org/doi/10.1145/3561954.3561958","DOIUrl":null,"url":null,"abstract":"<p>The recent success of Artificial Intelligence (AI) is rooted into several concomitant factors, namely theoretical progress coupled with abundance of data and computing power. Large companies can take advantage of a deluge of data, typically withhold from the research community due to privacy or business sensitivity concerns, and this is particularly true for networking data. Therefore, the lack of high quality data is often recognized as one of the main factors currently limiting networking research from fully leveraging AI methodologies potential.</p><p>Following numerous requests we received from the scientific community, we release AppClassNet, a commercial-grade dataset for benchmarking traffic classification and management methodologies. AppClassNet is significantly larger than the datasets generally available to the academic community in terms of both the number of samples and classes, and reaches scales similar to the popular ImageNet dataset commonly used in computer vision literature. To avoid leaking user- and business-sensitive information, we opportunely anonymized the dataset, while empirically showing that it still represents a relevant benchmark for algorithmic research. In this paper, we describe the public dataset and our anonymization process. We hope that AppClassNet can be instrumental for other researchers to address more complex commercial-grade problems in the broad field of traffic classification and management.</p>","PeriodicalId":50646,"journal":{"name":"ACM Sigcomm Computer Communication Review","volume":"72 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2022-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Sigcomm Computer Communication Review","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3561954.3561958","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

The recent success of Artificial Intelligence (AI) is rooted into several concomitant factors, namely theoretical progress coupled with abundance of data and computing power. Large companies can take advantage of a deluge of data, typically withhold from the research community due to privacy or business sensitivity concerns, and this is particularly true for networking data. Therefore, the lack of high quality data is often recognized as one of the main factors currently limiting networking research from fully leveraging AI methodologies potential.

Following numerous requests we received from the scientific community, we release AppClassNet, a commercial-grade dataset for benchmarking traffic classification and management methodologies. AppClassNet is significantly larger than the datasets generally available to the academic community in terms of both the number of samples and classes, and reaches scales similar to the popular ImageNet dataset commonly used in computer vision literature. To avoid leaking user- and business-sensitive information, we opportunely anonymized the dataset, while empirically showing that it still represents a relevant benchmark for algorithmic research. In this paper, we describe the public dataset and our anonymization process. We hope that AppClassNet can be instrumental for other researchers to address more complex commercial-grade problems in the broad field of traffic classification and management.

AppClassNet:用于应用识别研究的商业级数据集:ACM SIGCOMM计算机通信评论:Vol 52, No 3
人工智能(AI)最近的成功植根于几个相关因素,即理论进步加上丰富的数据和计算能力。大公司可以利用大量的数据,通常由于隐私或商业敏感性问题而对研究界保密,对于网络数据尤其如此。因此,缺乏高质量的数据通常被认为是目前限制网络研究充分利用人工智能方法潜力的主要因素之一。在收到科学界的大量请求后,我们发布了AppClassNet,这是一个商业级数据集,用于对流量分类和管理方法进行基准测试。AppClassNet在样本数量和类别数量上都明显大于学术界通常可用的数据集,并且达到了与计算机视觉文献中常用的流行ImageNet数据集相似的规模。为了避免泄露用户和业务敏感信息,我们适时地对数据集进行了匿名化,同时从经验上表明,它仍然代表了算法研究的相关基准。在本文中,我们描述了公共数据集和我们的匿名化过程。我们希望AppClassNet可以帮助其他研究人员在广泛的流量分类和管理领域解决更复杂的商业级问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACM Sigcomm Computer Communication Review
ACM Sigcomm Computer Communication Review 工程技术-计算机:信息系统
CiteScore
6.90
自引率
3.60%
发文量
20
审稿时长
4-8 weeks
期刊介绍: Computer Communication Review (CCR) is an online publication of the ACM Special Interest Group on Data Communication (SIGCOMM) and publishes articles on topics within the SIG''s field of interest. Technical papers accepted to CCR typically report on practical advances or the practical applications of theoretical advances. CCR serves as a forum for interesting and novel ideas at an early stage in their development. The focus is on timely dissemination of new ideas that may help trigger additional investigations. While the innovation and timeliness are the major criteria for its acceptance, technical robustness and readability will also be considered in the review process. We particularly encourage papers with early evaluation or feasibility studies.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信