PINOT: Programmable Infrastructure for Networking

Proceedings of the Applied Networking Research Workshop Pub Date : 2023-07-22 DOI:10.1145/3606464.3606485

Roman Beltiukov, Sanjay Chandrasekaran, Arpit Gupta, W. Willinger

{"title":"PINOT: Programmable Infrastructure for Networking","authors":"Roman Beltiukov, Sanjay Chandrasekaran, Arpit Gupta, W. Willinger","doi":"10.1145/3606464.3606485","DOIUrl":null,"url":null,"abstract":"As modern network communication moves closer to being fully encrypted and hence less exposed to passive monitoring, traditional network measurements that rely on unencrypted fields in captured traffic provide less and less visibility into today’s network traffic. At the same time, approaches that use techniques from machine learning (ML) to extract subtle temporal and spatial patterns from encrypted packet-level traces have shown great promise in offsetting the lack of visibility due to encryption [1–3, 5–7, 10–15, 18, 23, 24]. Despite their promise, ML-based approaches often have a credibility problem that arises from the quality of underlying training data. Given the challenges of curating high-quality training data at scale, researchers typically end up collecting their own (or reusing existing third-party or synthetic) data, often from small-scale testbeds. Such data is generally of low quality as it is not representative of the target environment, collected over too short of a time period, or measured at too coarse of a granularity. The learning models trained using such data tend to be vulnerable to different failure modes that make them not credible [8]. This observation begs a fundamental question, how can we develop credible ML artifacts for managing encrypted network traffic? This paper describes our ongoing efforts to enable researchers and practitioners to develop more credible ML artifacts by lowering the effort that is required for collecting more high-quality data for a wide range of learning problems from realistic and representative network environments.","PeriodicalId":147697,"journal":{"name":"Proceedings of the Applied Networking Research Workshop","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Applied Networking Research Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3606464.3606485","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

As modern network communication moves closer to being fully encrypted and hence less exposed to passive monitoring, traditional network measurements that rely on unencrypted fields in captured traffic provide less and less visibility into today’s network traffic. At the same time, approaches that use techniques from machine learning (ML) to extract subtle temporal and spatial patterns from encrypted packet-level traces have shown great promise in offsetting the lack of visibility due to encryption [1–3, 5–7, 10–15, 18, 23, 24]. Despite their promise, ML-based approaches often have a credibility problem that arises from the quality of underlying training data. Given the challenges of curating high-quality training data at scale, researchers typically end up collecting their own (or reusing existing third-party or synthetic) data, often from small-scale testbeds. Such data is generally of low quality as it is not representative of the target environment, collected over too short of a time period, or measured at too coarse of a granularity. The learning models trained using such data tend to be vulnerable to different failure modes that make them not credible [8]. This observation begs a fundamental question, how can we develop credible ML artifacts for managing encrypted network traffic? This paper describes our ongoing efforts to enable researchers and practitioners to develop more credible ML artifacts by lowering the effort that is required for collecting more high-quality data for a wide range of learning problems from realistic and representative network environments.

查看原文本刊更多论文

PINOT:网络的可编程基础设施

随着现代网络通信越来越接近完全加密，因此更少暴露于被动监控，依赖于捕获流量中未加密字段的传统网络测量对当今网络流量的可见性越来越低。与此同时，使用机器学习(ML)技术从加密的数据包级轨迹中提取微妙的时间和空间模式的方法在抵消由于加密而缺乏可见性方面显示出很大的希望[1 - 3,5 - 7,10 - 15,18,23,24]。尽管它们的承诺，基于机器学习的方法往往有一个可信度问题，这是由底层训练数据的质量引起的。考虑到大规模管理高质量训练数据的挑战，研究人员通常会收集自己的(或重用现有的第三方或合成)数据，通常来自小规模的测试平台。这样的数据通常质量较低，因为它不能代表目标环境，收集的时间太短，或者测量的粒度太粗。使用这些数据训练的学习模型往往容易受到不同失效模式的影响，使其不可信[8]。这个观察结果引出了一个基本问题，我们如何开发可信的机器学习工件来管理加密的网络流量?本文描述了我们正在进行的努力，通过降低从现实和具有代表性的网络环境中为广泛的学习问题收集更多高质量数据所需的工作量，使研究人员和实践者能够开发更可信的ML工件。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Applied Networking Research Workshop

自引率

0.00%

发文量