Chao Wang, Alessandro Finamore, Lixuan Yang, Kevin Fauvel, Dario Rossi
{"title":"AppClassNet: a commercial-grade dataset for application identification research: ACM SIGCOMM Computer Communication Review: Vol 52, No 3","authors":"Chao Wang, Alessandro Finamore, Lixuan Yang, Kevin Fauvel, Dario Rossi","doi":"https://dl.acm.org/doi/10.1145/3561954.3561958","DOIUrl":null,"url":null,"abstract":"<p>The recent success of Artificial Intelligence (AI) is rooted into several concomitant factors, namely theoretical progress coupled with abundance of data and computing power. Large companies can take advantage of a deluge of data, typically withhold from the research community due to privacy or business sensitivity concerns, and this is particularly true for networking data. Therefore, the lack of high quality data is often recognized as one of the main factors currently limiting networking research from fully leveraging AI methodologies potential.</p><p>Following numerous requests we received from the scientific community, we release AppClassNet, a commercial-grade dataset for benchmarking traffic classification and management methodologies. AppClassNet is significantly larger than the datasets generally available to the academic community in terms of both the number of samples and classes, and reaches scales similar to the popular ImageNet dataset commonly used in computer vision literature. To avoid leaking user- and business-sensitive information, we opportunely anonymized the dataset, while empirically showing that it still represents a relevant benchmark for algorithmic research. In this paper, we describe the public dataset and our anonymization process. We hope that AppClassNet can be instrumental for other researchers to address more complex commercial-grade problems in the broad field of traffic classification and management.</p>","PeriodicalId":50646,"journal":{"name":"ACM Sigcomm Computer Communication Review","volume":"72 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2022-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Sigcomm Computer Communication Review","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3561954.3561958","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The recent success of Artificial Intelligence (AI) is rooted into several concomitant factors, namely theoretical progress coupled with abundance of data and computing power. Large companies can take advantage of a deluge of data, typically withhold from the research community due to privacy or business sensitivity concerns, and this is particularly true for networking data. Therefore, the lack of high quality data is often recognized as one of the main factors currently limiting networking research from fully leveraging AI methodologies potential.
Following numerous requests we received from the scientific community, we release AppClassNet, a commercial-grade dataset for benchmarking traffic classification and management methodologies. AppClassNet is significantly larger than the datasets generally available to the academic community in terms of both the number of samples and classes, and reaches scales similar to the popular ImageNet dataset commonly used in computer vision literature. To avoid leaking user- and business-sensitive information, we opportunely anonymized the dataset, while empirically showing that it still represents a relevant benchmark for algorithmic research. In this paper, we describe the public dataset and our anonymization process. We hope that AppClassNet can be instrumental for other researchers to address more complex commercial-grade problems in the broad field of traffic classification and management.
期刊介绍:
Computer Communication Review (CCR) is an online publication of the ACM Special Interest Group on Data Communication (SIGCOMM) and publishes articles on topics within the SIG''s field of interest. Technical papers accepted to CCR typically report on practical advances or the practical applications of theoretical advances. CCR serves as a forum for interesting and novel ideas at an early stage in their development. The focus is on timely dissemination of new ideas that may help trigger additional investigations. While the innovation and timeliness are the major criteria for its acceptance, technical robustness and readability will also be considered in the review process. We particularly encourage papers with early evaluation or feasibility studies.