{"title":"Detecting SQL Injection Web Attacks Using Ensemble Learners and Data Sampling","authors":"R. Zuech, John T. Hancock, T. Khoshgoftaar","doi":"10.1109/CSR51186.2021.9527990","DOIUrl":null,"url":null,"abstract":"SQL Injection web attacks are a common choice among attackers to exploit web servers. We explore classification performance in detecting SQL Injection web attacks in the recent CSE-CIC-IDS2018 dataset with the Area Under the Receiver Operating Characteristic Curve (AUC) metric for the following seven classifiers: Random Forest (RF), CatBoost (CB), LightGBM (LGB), XGBoost (XGB), Decision Tree (DT), Naive Bayes (NB), and Logistic Regression (LR) (with the first four learners being ensemble learners and for comparison, the last three being single learners). Our unique data preparation of CSE-CID- IDS2018 affords a harsh experimental testbed of class imbalance as encountered in the real world for cybersecurity attacks. To the best of our knowledge, we are the first to apply random undersampling techniques to web attacks from the CSE-CIC- IDS2018 dataset while exploring various sampling ratios. We find the ensemble learners to be the most effective at detecting SQL Injection web attacks, but only after first applying massive data sampling.","PeriodicalId":253300,"journal":{"name":"2021 IEEE International Conference on Cyber Security and Resilience (CSR)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Cyber Security and Resilience (CSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSR51186.2021.9527990","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
SQL Injection web attacks are a common choice among attackers to exploit web servers. We explore classification performance in detecting SQL Injection web attacks in the recent CSE-CIC-IDS2018 dataset with the Area Under the Receiver Operating Characteristic Curve (AUC) metric for the following seven classifiers: Random Forest (RF), CatBoost (CB), LightGBM (LGB), XGBoost (XGB), Decision Tree (DT), Naive Bayes (NB), and Logistic Regression (LR) (with the first four learners being ensemble learners and for comparison, the last three being single learners). Our unique data preparation of CSE-CID- IDS2018 affords a harsh experimental testbed of class imbalance as encountered in the real world for cybersecurity attacks. To the best of our knowledge, we are the first to apply random undersampling techniques to web attacks from the CSE-CIC- IDS2018 dataset while exploring various sampling ratios. We find the ensemble learners to be the most effective at detecting SQL Injection web attacks, but only after first applying massive data sampling.