PINOT: Programmable Infrastructure for Networking

Roman Beltiukov, Sanjay Chandrasekaran, Arpit Gupta, W. Willinger
{"title":"PINOT: Programmable Infrastructure for Networking","authors":"Roman Beltiukov, Sanjay Chandrasekaran, Arpit Gupta, W. Willinger","doi":"10.1145/3606464.3606485","DOIUrl":null,"url":null,"abstract":"As modern network communication moves closer to being fully encrypted and hence less exposed to passive monitoring, traditional network measurements that rely on unencrypted fields in captured traffic provide less and less visibility into today’s network traffic. At the same time, approaches that use techniques from machine learning (ML) to extract subtle temporal and spatial patterns from encrypted packet-level traces have shown great promise in offsetting the lack of visibility due to encryption [1–3, 5–7, 10–15, 18, 23, 24]. Despite their promise, ML-based approaches often have a credibility problem that arises from the quality of underlying training data. Given the challenges of curating high-quality training data at scale, researchers typically end up collecting their own (or reusing existing third-party or synthetic) data, often from small-scale testbeds. Such data is generally of low quality as it is not representative of the target environment, collected over too short of a time period, or measured at too coarse of a granularity. The learning models trained using such data tend to be vulnerable to different failure modes that make them not credible [8]. This observation begs a fundamental question, how can we develop credible ML artifacts for managing encrypted network traffic? This paper describes our ongoing efforts to enable researchers and practitioners to develop more credible ML artifacts by lowering the effort that is required for collecting more high-quality data for a wide range of learning problems from realistic and representative network environments.","PeriodicalId":147697,"journal":{"name":"Proceedings of the Applied Networking Research Workshop","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Applied Networking Research Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3606464.3606485","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

As modern network communication moves closer to being fully encrypted and hence less exposed to passive monitoring, traditional network measurements that rely on unencrypted fields in captured traffic provide less and less visibility into today’s network traffic. At the same time, approaches that use techniques from machine learning (ML) to extract subtle temporal and spatial patterns from encrypted packet-level traces have shown great promise in offsetting the lack of visibility due to encryption [1–3, 5–7, 10–15, 18, 23, 24]. Despite their promise, ML-based approaches often have a credibility problem that arises from the quality of underlying training data. Given the challenges of curating high-quality training data at scale, researchers typically end up collecting their own (or reusing existing third-party or synthetic) data, often from small-scale testbeds. Such data is generally of low quality as it is not representative of the target environment, collected over too short of a time period, or measured at too coarse of a granularity. The learning models trained using such data tend to be vulnerable to different failure modes that make them not credible [8]. This observation begs a fundamental question, how can we develop credible ML artifacts for managing encrypted network traffic? This paper describes our ongoing efforts to enable researchers and practitioners to develop more credible ML artifacts by lowering the effort that is required for collecting more high-quality data for a wide range of learning problems from realistic and representative network environments.
PINOT:网络的可编程基础设施
随着现代网络通信越来越接近完全加密,因此更少暴露于被动监控,依赖于捕获流量中未加密字段的传统网络测量对当今网络流量的可见性越来越低。与此同时,使用机器学习(ML)技术从加密的数据包级轨迹中提取微妙的时间和空间模式的方法在抵消由于加密而缺乏可见性方面显示出很大的希望[1 - 3,5 - 7,10 - 15,18,23,24]。尽管它们的承诺,基于机器学习的方法往往有一个可信度问题,这是由底层训练数据的质量引起的。考虑到大规模管理高质量训练数据的挑战,研究人员通常会收集自己的(或重用现有的第三方或合成)数据,通常来自小规模的测试平台。这样的数据通常质量较低,因为它不能代表目标环境,收集的时间太短,或者测量的粒度太粗。使用这些数据训练的学习模型往往容易受到不同失效模式的影响,使其不可信[8]。这个观察结果引出了一个基本问题,我们如何开发可信的机器学习工件来管理加密的网络流量?本文描述了我们正在进行的努力,通过降低从现实和具有代表性的网络环境中为广泛的学习问题收集更多高质量数据所需的工作量,使研究人员和实践者能够开发更可信的ML工件。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信