使用LightGBM和XGBoost学习器检测使用不同网络特征的网络安全攻击

2020 IEEE Second International Conference on Cognitive Machine Intelligence (CogMI) Pub Date : 2020-10-01 DOI:10.1109/CogMI50398.2020.00032

Joffrey L. Leevy, John T. Hancock, R. Zuech, T. Khoshgoftaar

{"title":"使用LightGBM和XGBoost学习器检测使用不同网络特征的网络安全攻击","authors":"Joffrey L. Leevy, John T. Hancock, R. Zuech, T. Khoshgoftaar","doi":"10.1109/CogMI50398.2020.00032","DOIUrl":null,"url":null,"abstract":"CSE-CIC-IDS2018 is an intrusion detection dataset containing roughly 16,000,000 normal and anomalous instances, with about 17% of these instances representing attack traffic. Our big data study has two parts, ensemble feature selection and model comparison. In the first part, we select features from the dataset for input to two classifiers that we employ in the second part. In the second part, we evaluate the performance of the classifiers with Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) and Fl-score. The outcome of our experiments enables us to answer three research questions. The first question is, “Does feature selection impact performance of classifiers in terms of AUC and Fl-score?” The second question is, “Does including the Destination_Port categorical feature significantly impact performance of LightGBM in terms of AUC and Fl-score?” And, our third question is, “Does the choice of classifier: LightGBM or XGBoost, significantly impact performance in terms of AUC and Fl-score?” For CSE-CIC-IDS2018, we conclude that feature selection and classifier choice impact performance score, and Destination_Port is a significant feature for LightGBM. In our case study, we present the application and analysis of the impact of an ensemble feature selection technique. To the best of our knowledge, we are the first to apply this technique to the CSE-CIC-IDS2018 dataset.","PeriodicalId":360326,"journal":{"name":"2020 IEEE Second International Conference on Cognitive Machine Intelligence (CogMI)","volume":"186 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Detecting Cybersecurity Attacks Using Different Network Features with LightGBM and XGBoost Learners\",\"authors\":\"Joffrey L. Leevy, John T. Hancock, R. Zuech, T. Khoshgoftaar\",\"doi\":\"10.1109/CogMI50398.2020.00032\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"CSE-CIC-IDS2018 is an intrusion detection dataset containing roughly 16,000,000 normal and anomalous instances, with about 17% of these instances representing attack traffic. Our big data study has two parts, ensemble feature selection and model comparison. In the first part, we select features from the dataset for input to two classifiers that we employ in the second part. In the second part, we evaluate the performance of the classifiers with Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) and Fl-score. The outcome of our experiments enables us to answer three research questions. The first question is, “Does feature selection impact performance of classifiers in terms of AUC and Fl-score?” The second question is, “Does including the Destination_Port categorical feature significantly impact performance of LightGBM in terms of AUC and Fl-score?” And, our third question is, “Does the choice of classifier: LightGBM or XGBoost, significantly impact performance in terms of AUC and Fl-score?” For CSE-CIC-IDS2018, we conclude that feature selection and classifier choice impact performance score, and Destination_Port is a significant feature for LightGBM. In our case study, we present the application and analysis of the impact of an ensemble feature selection technique. To the best of our knowledge, we are the first to apply this technique to the CSE-CIC-IDS2018 dataset.\",\"PeriodicalId\":360326,\"journal\":{\"name\":\"2020 IEEE Second International Conference on Cognitive Machine Intelligence (CogMI)\",\"volume\":\"186 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE Second International Conference on Cognitive Machine Intelligence (CogMI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CogMI50398.2020.00032\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Second International Conference on Cognitive Machine Intelligence (CogMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CogMI50398.2020.00032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

摘要

CSE-CIC-IDS2018是一个入侵检测数据集，包含大约1600万个正常和异常实例，其中约17%的实例代表攻击流量。我们的大数据研究分为两个部分，集合特征选择和模型比较。在第一部分中，我们从数据集中选择特征输入到我们在第二部分中使用的两个分类器。在第二部分中，我们用受试者工作特征曲线下面积(Area Under the Receiver Operating Characteristic Curve, AUC)和Fl-score来评估分类器的性能。我们的实验结果使我们能够回答三个研究问题。第一个问题是，“特征选择是否会影响分类器在AUC和l-score方面的性能?”第二个问题是，“包含Destination_Port分类特征是否会在AUC和Fl-score方面显著影响LightGBM的性能?”并且，我们的第三个问题是，“选择分类器:LightGBM或XGBoost，在AUC和fl分数方面会显著影响性能吗?”对于CSE-CIC-IDS2018，我们得出特征选择和分类器选择影响性能得分的结论，其中Destination_Port是LightGBM的重要特征。在我们的案例研究中，我们介绍了集成特征选择技术的应用和影响分析。据我们所知，我们是第一个将这种技术应用于CSE-CIC-IDS2018数据集的人。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Detecting Cybersecurity Attacks Using Different Network Features with LightGBM and XGBoost Learners

CSE-CIC-IDS2018 is an intrusion detection dataset containing roughly 16,000,000 normal and anomalous instances, with about 17% of these instances representing attack traffic. Our big data study has two parts, ensemble feature selection and model comparison. In the first part, we select features from the dataset for input to two classifiers that we employ in the second part. In the second part, we evaluate the performance of the classifiers with Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) and Fl-score. The outcome of our experiments enables us to answer three research questions. The first question is, “Does feature selection impact performance of classifiers in terms of AUC and Fl-score?” The second question is, “Does including the Destination_Port categorical feature significantly impact performance of LightGBM in terms of AUC and Fl-score?” And, our third question is, “Does the choice of classifier: LightGBM or XGBoost, significantly impact performance in terms of AUC and Fl-score?” For CSE-CIC-IDS2018, we conclude that feature selection and classifier choice impact performance score, and Destination_Port is a significant feature for LightGBM. In our case study, we present the application and analysis of the impact of an ensemble feature selection technique. To the best of our knowledge, we are the first to apply this technique to the CSE-CIC-IDS2018 dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE Second International Conference on Cognitive Machine Intelligence (CogMI)

自引率

0.00%

发文量