The intrinsic dimensionality of network datasets and its applications1

IF 0.9 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Computer Security Pub Date : 2023-11-10 DOI:10.3233/jcs-220131

Matt Gorbett, Caspian Siebert, Hossein Shirazi, Indrakshi Ray

{"title":"The intrinsic dimensionality of network datasets and its applications1","authors":"Matt Gorbett, Caspian Siebert, Hossein Shirazi, Indrakshi Ray","doi":"10.3233/jcs-220131","DOIUrl":null,"url":null,"abstract":"Modern network infrastructures are in a constant state of transformation, in large part due to the exponential growth of Internet of Things (IoT) devices. The unique properties of IoT-connected networks, such as heterogeneity and non-standardized protocol, have created critical security holes and network mismanagement. In this paper we propose a new measurement tool, Intrinsic Dimensionality (ID), to aid in analyzing and classifying network traffic. A proxy for dataset complexity, ID can be used to understand the network as a whole, aiding in tasks such as network management and provisioning. We use ID to evaluate several modern network datasets empirically. Showing that, for network and device-level data, generated using IoT methodologies, the ID of the data fits into a low dimensional representation. Additionally we explore network data complexity at the sample level using Local Intrinsic Dimensionality (LID) and propose a novel unsupervised intrusion detection technique, the Weighted Hamming LID Estimator. We show that the algortihm performs better on IoT network datasets than the Autoencoder, KNN, and Isolation Forests. Finally, we propose the use of synthetic data as an additional tool for both network data measurement as well as intrusion detection. Synthetically generated data can aid in building a more robust network dataset, while also helping in downstream tasks such as machine learning based intrusion detection models. We explore the effects of synthetic data on ID measurements, as well as its role in intrusion detection systems.","PeriodicalId":46074,"journal":{"name":"Journal of Computer Security","volume":"78 3","pages":"0"},"PeriodicalIF":0.9000,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/jcs-220131","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Modern network infrastructures are in a constant state of transformation, in large part due to the exponential growth of Internet of Things (IoT) devices. The unique properties of IoT-connected networks, such as heterogeneity and non-standardized protocol, have created critical security holes and network mismanagement. In this paper we propose a new measurement tool, Intrinsic Dimensionality (ID), to aid in analyzing and classifying network traffic. A proxy for dataset complexity, ID can be used to understand the network as a whole, aiding in tasks such as network management and provisioning. We use ID to evaluate several modern network datasets empirically. Showing that, for network and device-level data, generated using IoT methodologies, the ID of the data fits into a low dimensional representation. Additionally we explore network data complexity at the sample level using Local Intrinsic Dimensionality (LID) and propose a novel unsupervised intrusion detection technique, the Weighted Hamming LID Estimator. We show that the algortihm performs better on IoT network datasets than the Autoencoder, KNN, and Isolation Forests. Finally, we propose the use of synthetic data as an additional tool for both network data measurement as well as intrusion detection. Synthetically generated data can aid in building a more robust network dataset, while also helping in downstream tasks such as machine learning based intrusion detection models. We explore the effects of synthetic data on ID measurements, as well as its role in intrusion detection systems.

查看原文本刊更多论文

网络数据集的内在维数及其应用

现代网络基础设施处于不断转型的状态，这在很大程度上是由于物联网(IoT)设备的指数级增长。物联网连接网络的独特属性，如异构性和非标准化协议，造成了严重的安全漏洞和网络管理不善。本文提出了一种新的测量工具——内在维数(Intrinsic Dimensionality, ID)，用于分析和分类网络流量。作为数据集复杂性的代理，ID可用于从整体上理解网络，帮助完成网络管理和供应等任务。我们使用ID对几个现代网络数据集进行了实证评估。这表明，对于使用物联网方法生成的网络和设备级数据，数据的ID适合低维表示。此外，我们使用局部固有维数(LID)在样本水平上探索网络数据的复杂性，并提出了一种新的无监督入侵检测技术，加权Hamming LID估计器。我们表明，该算法在物联网网络数据集上的性能优于自编码器、KNN和隔离森林。最后，我们建议使用合成数据作为网络数据测量和入侵检测的附加工具。综合生成的数据可以帮助构建更强大的网络数据集，同时也有助于下游任务，如基于机器学习的入侵检测模型。我们探讨了合成数据对ID测量的影响，以及它在入侵检测系统中的作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Computer Security COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

1.70

自引率

0.00%

发文量

期刊介绍： The Journal of Computer Security presents research and development results of lasting significance in the theory, design, implementation, analysis, and application of secure computer systems and networks. It will also provide a forum for ideas about the meaning and implications of security and privacy, particularly those with important consequences for the technical community. The Journal provides an opportunity to publish articles of greater depth and length than is possible in the proceedings of various existing conferences, while addressing an audience of researchers in computer security who can be assumed to have a more specialized background than the readership of other archival publications.