CICIDS 2017入侵检测数据集综述与探索性分析

Journal of Systems Engineering and Information Technology (JOSEIT) Pub Date : 2023-09-17 DOI:10.29207/joseit.v2i2.5411

Akinyemi Oyelakin, None Ameen A.O, None Ogundele T.S, None Salau-Ibrahim T, None Abdulrauf U.T, None Olufadi H.I, None Ajiboye I.K, None Muhammad-Thani S, None Adeniji I. A

{"title":"CICIDS 2017入侵检测数据集综述与探索性分析","authors":"Akinyemi Oyelakin, None Ameen A.O, None Ogundele T.S, None Salau-Ibrahim T, None Abdulrauf U.T, None Olufadi H.I, None Ajiboye I.K, None Muhammad-Thani S, None Adeniji I. A","doi":"10.29207/joseit.v2i2.5411","DOIUrl":null,"url":null,"abstract":"Intrusion detection systems are used to detect attacks on a network. Machine learning (ML) approaches have been widely used to build such intrusion detection systems (IDSs) because they are more accurate when built from a very large and representative dataset. Recently, one of the benchmark datasets that are used to build ML-based intrusion detection models is the CICIDS2017 dataset. The data set is contained in eight groups and was collected from the Data Set & Repository of the Canadian Institute of Cyber Security. The data set is available in both PCAP and net flow formats. This study used the net flow records in the CIDIDS2017 dataset, as they were found to contain newer attacks, very large, and useful for traffic analysis. Exploratory data analysis (EDA) techniques were used to reveal various characteristics of the dataset. The general objective is to provide more insight into the nature, structure, and issues of the data set so as to identify the best ways to use it to achieve improved ML-based IDS models. Furthermore, some of the open problems that can arise from the use of the dataset in any machine learning-based intrusion detection systems are highlighted and possible solutions are briefly discussed. The EDA techniques used revealed important relationships between the input variables and the target class. The study concluded that the EDA can better influence the decision about future IDS research using the dataset.","PeriodicalId":496970,"journal":{"name":"Journal of Systems Engineering and Information Technology (JOSEIT)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Overview and Exploratory Analyses of CICIDS 2017 Intrusion Detection Dataset\",\"authors\":\"Akinyemi Oyelakin, None Ameen A.O, None Ogundele T.S, None Salau-Ibrahim T, None Abdulrauf U.T, None Olufadi H.I, None Ajiboye I.K, None Muhammad-Thani S, None Adeniji I. A\",\"doi\":\"10.29207/joseit.v2i2.5411\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Intrusion detection systems are used to detect attacks on a network. Machine learning (ML) approaches have been widely used to build such intrusion detection systems (IDSs) because they are more accurate when built from a very large and representative dataset. Recently, one of the benchmark datasets that are used to build ML-based intrusion detection models is the CICIDS2017 dataset. The data set is contained in eight groups and was collected from the Data Set & Repository of the Canadian Institute of Cyber Security. The data set is available in both PCAP and net flow formats. This study used the net flow records in the CIDIDS2017 dataset, as they were found to contain newer attacks, very large, and useful for traffic analysis. Exploratory data analysis (EDA) techniques were used to reveal various characteristics of the dataset. The general objective is to provide more insight into the nature, structure, and issues of the data set so as to identify the best ways to use it to achieve improved ML-based IDS models. Furthermore, some of the open problems that can arise from the use of the dataset in any machine learning-based intrusion detection systems are highlighted and possible solutions are briefly discussed. The EDA techniques used revealed important relationships between the input variables and the target class. The study concluded that the EDA can better influence the decision about future IDS research using the dataset.\",\"PeriodicalId\":496970,\"journal\":{\"name\":\"Journal of Systems Engineering and Information Technology (JOSEIT)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems Engineering and Information Technology (JOSEIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.29207/joseit.v2i2.5411\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Engineering and Information Technology (JOSEIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29207/joseit.v2i2.5411","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

入侵检测系统用于检测网络中的攻击行为。机器学习(ML)方法已被广泛用于构建此类入侵检测系统(ids)，因为它们在从非常大且具有代表性的数据集构建时更加准确。最近，用于构建基于ml的入侵检测模型的基准数据集之一是CICIDS2017数据集。数据集包含8组，从数据集&加拿大网络安全研究所存储库。该数据集以PCAP和净流格式提供。本研究使用了CIDIDS2017数据集中的净流量记录，因为它们被发现包含较新的攻击，非常大，并且对流量分析很有用。探索性数据分析(EDA)技术用于揭示数据集的各种特征。总体目标是更深入地了解数据集的性质、结构和问题，以便确定使用它来实现改进的基于ml的IDS模型的最佳方法。此外，强调了在任何基于机器学习的入侵检测系统中使用数据集可能产生的一些开放问题，并简要讨论了可能的解决方案。所使用的EDA技术揭示了输入变量和目标类之间的重要关系。该研究得出结论，EDA可以更好地影响使用该数据集的未来IDS研究决策。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Overview and Exploratory Analyses of CICIDS 2017 Intrusion Detection Dataset

Intrusion detection systems are used to detect attacks on a network. Machine learning (ML) approaches have been widely used to build such intrusion detection systems (IDSs) because they are more accurate when built from a very large and representative dataset. Recently, one of the benchmark datasets that are used to build ML-based intrusion detection models is the CICIDS2017 dataset. The data set is contained in eight groups and was collected from the Data Set & Repository of the Canadian Institute of Cyber Security. The data set is available in both PCAP and net flow formats. This study used the net flow records in the CIDIDS2017 dataset, as they were found to contain newer attacks, very large, and useful for traffic analysis. Exploratory data analysis (EDA) techniques were used to reveal various characteristics of the dataset. The general objective is to provide more insight into the nature, structure, and issues of the data set so as to identify the best ways to use it to achieve improved ML-based IDS models. Furthermore, some of the open problems that can arise from the use of the dataset in any machine learning-based intrusion detection systems are highlighted and possible solutions are briefly discussed. The EDA techniques used revealed important relationships between the input variables and the target class. The study concluded that the EDA can better influence the decision about future IDS research using the dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Systems Engineering and Information Technology (JOSEIT)

自引率

0.00%

发文量