Analyzing ML-Based IDS over Real-Traffic

Vol 4 Issue 3 Pub Date : 2022-06-30 DOI:10.33411/ijist/2022040306

Shafqat Ali Siyyal, Faheem Yar Khuawar, E. Saba, A. L. Memon, Muhammad Raza Shaikh

{"title":"Analyzing ML-Based IDS over Real-Traffic","authors":"Shafqat Ali Siyyal, Faheem Yar Khuawar, E. Saba, A. L. Memon, Muhammad Raza Shaikh","doi":"10.33411/ijist/2022040306","DOIUrl":null,"url":null,"abstract":"The rapid growth of computer networks has caused a significant increase in malicious traffic, promoting the use of Intrusion Detection Systems (IDSs) to protect against this ever-growing attack traffic. A great number of IDS have been developed with some sort of weaknesses and strengths. Most of the development and research of IDS is purely based on simulated and non-updated datasets due to the unavailability of real datasets, for instance, KDD '99, and CIC-IDS-18 which are widely used datasets by researchers are not sufficient to represent real-traffic scenarios. Moreover, these one-time generated static datasets cannot survive the rapid changes in network patterns. To overcome these problems, we have proposed a framework to generate a full feature, unbiased, real-traffic-based, updated custom dataset to deal with the limitations of existing datasets. In this paper, the complete methodology of network testbed, data acquisition and attack scenarios are discussed. The generated dataset contains more than 70 features and covers different types of attacks, namely DoS, DDoS, Portscan, Brute-Force and Web attacks. Later, the custom-generated dataset is compared to various available datasets based on seven different factors, such as updates, practical-to-generate, realness, attack diversity, flexibility, availability, and interoperability. Additionally, we have trained different ML-based classifiers on our custom-generated dataset and then tested/analyzed it based on performance metrics. The generated dataset is publicly available and accessible by all users. Moreover, the following research is anticipated to allow researchers to develop effective IDSs and real traffic-based updated datasets.","PeriodicalId":326014,"journal":{"name":"Vol 4 Issue 3","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vol 4 Issue 3","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33411/ijist/2022040306","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

The rapid growth of computer networks has caused a significant increase in malicious traffic, promoting the use of Intrusion Detection Systems (IDSs) to protect against this ever-growing attack traffic. A great number of IDS have been developed with some sort of weaknesses and strengths. Most of the development and research of IDS is purely based on simulated and non-updated datasets due to the unavailability of real datasets, for instance, KDD '99, and CIC-IDS-18 which are widely used datasets by researchers are not sufficient to represent real-traffic scenarios. Moreover, these one-time generated static datasets cannot survive the rapid changes in network patterns. To overcome these problems, we have proposed a framework to generate a full feature, unbiased, real-traffic-based, updated custom dataset to deal with the limitations of existing datasets. In this paper, the complete methodology of network testbed, data acquisition and attack scenarios are discussed. The generated dataset contains more than 70 features and covers different types of attacks, namely DoS, DDoS, Portscan, Brute-Force and Web attacks. Later, the custom-generated dataset is compared to various available datasets based on seven different factors, such as updates, practical-to-generate, realness, attack diversity, flexibility, availability, and interoperability. Additionally, we have trained different ML-based classifiers on our custom-generated dataset and then tested/analyzed it based on performance metrics. The generated dataset is publicly available and accessible by all users. Moreover, the following research is anticipated to allow researchers to develop effective IDSs and real traffic-based updated datasets.

查看原文本刊更多论文

基于ml的IDS实时流量分析

计算机网络的快速发展导致恶意流量的显著增加，促使入侵检测系统(ids)的使用来防止这种不断增长的攻击流量。已经开发的大量IDS具有某种优缺点。由于缺乏真实的数据集，大多数IDS的开发和研究都是纯粹基于模拟和未更新的数据集，例如研究者广泛使用的KDD '99和CIC-IDS-18数据集不足以代表真实的流量场景。此外，这些一次性生成的静态数据集无法适应网络模式的快速变化。为了克服这些问题，我们提出了一个框架来生成一个完整的、无偏的、基于真实流量的、更新的自定义数据集，以处理现有数据集的局限性。本文讨论了网络测试平台的完整方法、数据采集和攻击场景。生成的数据集包含70多个特征，涵盖了不同类型的攻击，即DoS、DDoS、端口扫描、暴力破解和Web攻击。然后，将自定义生成的数据集与基于七个不同因素的各种可用数据集进行比较，例如更新、实际生成、真实性、攻击多样性、灵活性、可用性和互操作性。此外，我们在自定义生成的数据集上训练了不同的基于ml的分类器，然后根据性能指标对其进行测试/分析。生成的数据集是公开可用的，所有用户都可以访问。此外，预计以下研究将使研究人员能够开发有效的ids和基于真实流量的更新数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Vol 4 Issue 3

自引率

0.00%

发文量