{"title":"基于深度强化学习的恶意 URL 检测与特征选择","authors":"Antonio Maci, Nicola Tamma, Anthony J. Coscia","doi":"10.1109/ICAIC60265.2024.10433827","DOIUrl":null,"url":null,"abstract":"Data theft through web applications that emulate legitimate platforms constitutes a major network security issue. Countermeasures using artificial intelligence (AI)-based systems are often applied because they can effectively detect malicious websites, which are extremely outnumbered by legitimate ones. In this domain, deep reinforcement learning (DRL) emerges as an attractive field for the development of network intrusion detection models, even in the case of highly skewed class distributions. However, DRL requires training time that increases with data complexity. This paper combines a DRL-based classifier with state-of-the-art feature selection techniques to speed up training while retaining or even improving classification performance. Our experiments used the Mendeley dataset and five different statistical and correlation-based feature-ranking strategies. The results indicated that the selection technique based on the calculation of the Gini index reduces the number of columns in the dataset by 27%, saving more than 10% of training time and significantly improving classification scores compared with the case without selection strategies.","PeriodicalId":517265,"journal":{"name":"2024 IEEE 3rd International Conference on AI in Cybersecurity (ICAIC)","volume":"6 3","pages":"1-7"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Reinforcement Learning-based Malicious URL Detection with Feature Selection\",\"authors\":\"Antonio Maci, Nicola Tamma, Anthony J. Coscia\",\"doi\":\"10.1109/ICAIC60265.2024.10433827\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data theft through web applications that emulate legitimate platforms constitutes a major network security issue. Countermeasures using artificial intelligence (AI)-based systems are often applied because they can effectively detect malicious websites, which are extremely outnumbered by legitimate ones. In this domain, deep reinforcement learning (DRL) emerges as an attractive field for the development of network intrusion detection models, even in the case of highly skewed class distributions. However, DRL requires training time that increases with data complexity. This paper combines a DRL-based classifier with state-of-the-art feature selection techniques to speed up training while retaining or even improving classification performance. Our experiments used the Mendeley dataset and five different statistical and correlation-based feature-ranking strategies. The results indicated that the selection technique based on the calculation of the Gini index reduces the number of columns in the dataset by 27%, saving more than 10% of training time and significantly improving classification scores compared with the case without selection strategies.\",\"PeriodicalId\":517265,\"journal\":{\"name\":\"2024 IEEE 3rd International Conference on AI in Cybersecurity (ICAIC)\",\"volume\":\"6 3\",\"pages\":\"1-7\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2024 IEEE 3rd International Conference on AI in Cybersecurity (ICAIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAIC60265.2024.10433827\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 IEEE 3rd International Conference on AI in Cybersecurity (ICAIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAIC60265.2024.10433827","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Deep Reinforcement Learning-based Malicious URL Detection with Feature Selection
Data theft through web applications that emulate legitimate platforms constitutes a major network security issue. Countermeasures using artificial intelligence (AI)-based systems are often applied because they can effectively detect malicious websites, which are extremely outnumbered by legitimate ones. In this domain, deep reinforcement learning (DRL) emerges as an attractive field for the development of network intrusion detection models, even in the case of highly skewed class distributions. However, DRL requires training time that increases with data complexity. This paper combines a DRL-based classifier with state-of-the-art feature selection techniques to speed up training while retaining or even improving classification performance. Our experiments used the Mendeley dataset and five different statistical and correlation-based feature-ranking strategies. The results indicated that the selection technique based on the calculation of the Gini index reduces the number of columns in the dataset by 27%, saving more than 10% of training time and significantly improving classification scores compared with the case without selection strategies.