{"title":"Reinforcement Learning Based Accurate Detection of Malicious URLs with Multi-Feature Analysis","authors":"Xiaoyue Wan, Pengmin Li, Yuhuan Wang, Wei Wei, Liang Xiao","doi":"10.1109/iccc52777.2021.9580433","DOIUrl":null,"url":null,"abstract":"Malicious URLs result in malware installation, privacy leakage and illegal funding of mobile devices and computers. However, attackers frequently change domain names of URLs to avoid static detection and the malicious URL detection has to address variance in structure of domain names, which seriously degrades the detection accuracy in fixed detection policy selection and impedes optimal policy selection with theoretical analysis. In this paper, we propose an accurate detection of malicious URLs to protect Internet users from accessing malicious URLs, which designs a multi-feature analysis architecture to exploit lexical and content-based features and applies reinforcement learning (RL) to choose the detection mode and parameter. We provide a lightweight RL-based detection with transfer learning and a deep RL-based detection to further improve the detection accuracy for the server with sufficient computation resources. Malicious URLs that have specific domain name features including long numeric string or high percentage of the numeric character or alphabetic string without syllables are considered and simulation results show that this scheme improves the detection accuracy and increases the utility compared with the benchmark scheme.","PeriodicalId":425118,"journal":{"name":"2021 IEEE/CIC International Conference on Communications in China (ICCC)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/CIC International Conference on Communications in China (ICCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iccc52777.2021.9580433","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Malicious URLs result in malware installation, privacy leakage and illegal funding of mobile devices and computers. However, attackers frequently change domain names of URLs to avoid static detection and the malicious URL detection has to address variance in structure of domain names, which seriously degrades the detection accuracy in fixed detection policy selection and impedes optimal policy selection with theoretical analysis. In this paper, we propose an accurate detection of malicious URLs to protect Internet users from accessing malicious URLs, which designs a multi-feature analysis architecture to exploit lexical and content-based features and applies reinforcement learning (RL) to choose the detection mode and parameter. We provide a lightweight RL-based detection with transfer learning and a deep RL-based detection to further improve the detection accuracy for the server with sufficient computation resources. Malicious URLs that have specific domain name features including long numeric string or high percentage of the numeric character or alphabetic string without syllables are considered and simulation results show that this scheme improves the detection accuracy and increases the utility compared with the benchmark scheme.