Robust and Lightweight Modeling of IoT Network Behaviors From Raw Traffic Packets

IEEE Transactions on Machine Learning in Communications and Networking Pub Date : 2024-12-16 DOI:10.1109/TMLCN.2024.3517613

Aleksandar Pasquini;Rajesh Vasa;Irini Logothetis;Hassan Habibi Gharakheili;Alexander Chambers;Minh Tran

{"title":"Robust and Lightweight Modeling of IoT Network Behaviors From Raw Traffic Packets","authors":"Aleksandar Pasquini;Rajesh Vasa;Irini Logothetis;Hassan Habibi Gharakheili;Alexander Chambers;Minh Tran","doi":"10.1109/TMLCN.2024.3517613","DOIUrl":null,"url":null,"abstract":"Machine Learning (ML)-based techniques are increasingly used for network management tasks, such as intrusion detection, application identification, or asset management. Recent studies show that neural network-based traffic analysis can achieve performance comparable to human feature-engineered ML pipelines. However, neural networks provide this performance at a higher computational cost and complexity, due to high-throughput traffic conditions necessitating specialized hardware for real-time operations. This paper presents lightweight models for encoding characteristics of Internet-of-Things (IoT) network packets; 1) we present two strategies to encode packets (regardless of their size, encryption, and protocol) to integer vectors: a shallow lightweight neural network and compression. With a public dataset containing about 8 million packets emitted by 22 IoT device types, we show the encoded packets can form complete (up to 80%) and homogeneous (up to 89%) clusters; 2) we demonstrate the efficacy of our generated encodings in the downstream classification task and quantify their computing costs. We train three multi-class models to predict the IoT class given network packets and show our models can achieve the same levels of accuracy (94%) as deep neural network embeddings but with computing costs up to 10 times lower; 3) we examine how the amount of packet data (headers and payload) can affect the prediction quality. We demonstrate how the choice of Internet Protocol (IP) payloads strikes a balance between prediction accuracy (99%) and cost. Along with the cost-efficacy of models, this capability can result in rapid and accurate predictions, meeting the requirements of network operators.","PeriodicalId":100641,"journal":{"name":"IEEE Transactions on Machine Learning in Communications and Networking","volume":"3 ","pages":"98-116"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10802939","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Machine Learning in Communications and Networking","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10802939/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Machine Learning (ML)-based techniques are increasingly used for network management tasks, such as intrusion detection, application identification, or asset management. Recent studies show that neural network-based traffic analysis can achieve performance comparable to human feature-engineered ML pipelines. However, neural networks provide this performance at a higher computational cost and complexity, due to high-throughput traffic conditions necessitating specialized hardware for real-time operations. This paper presents lightweight models for encoding characteristics of Internet-of-Things (IoT) network packets; 1) we present two strategies to encode packets (regardless of their size, encryption, and protocol) to integer vectors: a shallow lightweight neural network and compression. With a public dataset containing about 8 million packets emitted by 22 IoT device types, we show the encoded packets can form complete (up to 80%) and homogeneous (up to 89%) clusters; 2) we demonstrate the efficacy of our generated encodings in the downstream classification task and quantify their computing costs. We train three multi-class models to predict the IoT class given network packets and show our models can achieve the same levels of accuracy (94%) as deep neural network embeddings but with computing costs up to 10 times lower; 3) we examine how the amount of packet data (headers and payload) can affect the prediction quality. We demonstrate how the choice of Internet Protocol (IP) payloads strikes a balance between prediction accuracy (99%) and cost. Along with the cost-efficacy of models, this capability can result in rapid and accurate predictions, meeting the requirements of network operators.

查看原文本刊更多论文

基于原始流量数据包的物联网网络行为鲁棒轻量级建模

基于机器学习（ML）的技术越来越多地用于网络管理任务，如入侵检测、应用程序识别或资产管理。最近的研究表明，基于神经网络的流量分析可以达到与人类特征工程ML管道相当的性能。然而，神经网络以更高的计算成本和复杂性提供这种性能，因为高吞吐量的流量条件需要专门的硬件来进行实时操作。本文提出了物联网（IoT）网络数据包编码特征的轻量级模型；1)我们提出了两种将数据包（无论其大小，加密和协议如何）编码为整数向量的策略：浅轻量级神经网络和压缩。使用包含22种物联网设备类型发出的约800万个数据包的公共数据集，我们显示编码的数据包可以形成完整（高达80%）和均匀（高达89%）的集群；2)我们证明了我们生成的编码在下游分类任务中的有效性，并量化了它们的计算成本。我们训练了三个多类模型来预测给定网络数据包的物联网类，并表明我们的模型可以达到与深度神经网络嵌入相同的精度水平（94%），但计算成本降低了10倍；3)我们检查数据包数据（报头和有效载荷）的数量如何影响预测质量。我们演示了互联网协议（IP）有效载荷的选择如何在预测精度（99%）和成本之间取得平衡。随着模型的成本效益，这种能力可以导致快速和准确的预测，满足网络运营商的要求。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Machine Learning in Communications and Networking

自引率

0.00%

发文量