{"title":"Malicious URL Detection and Classification Analysis using Machine Learning Models","authors":"Upendra Shetty D R, Anusha Patil, Mohana Mohana","doi":"10.1109/IDCIoT56793.2023.10053422","DOIUrl":null,"url":null,"abstract":"One of most frequent cybersecurity vulnerabilities is malicious websites or malicious uniform resource location (URL). Each year, people are losing billions of rupees by hosting gratuitous material (spam, malware, unsuitable adverts, spoofing etc.) and tempting naïve visitors to fall for scams. Email, adverts, web searches, or connections from other websites can all encourage people to visit these websites. Users click on the malicious URL in each instance, a trustworthy system that can categorize and identify dangerous URLs is needed due to rise in phishing, spamming, and malware occurrences. Due to the enormous amount of data, changing patterns and technologies, as well as the complex relationships between characteristics, non-availability of training data, non-linearity and the presence of outliers made classification challenging. In the proposed work, malicious URLs are detected for various applications. Dataset has been categorized into four types i.e., Phishing, Benign, Defacement and Malware. Totally 6,51,191 URLs have been used for proposed implementation. Three machine learning algorithms such as random forest, LightGBM and XGBoost were implemented to detect and classify malicious URLs.","PeriodicalId":60583,"journal":{"name":"物联网技术","volume":"15 1","pages":"470-476"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"物联网技术","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.1109/IDCIoT56793.2023.10053422","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
One of most frequent cybersecurity vulnerabilities is malicious websites or malicious uniform resource location (URL). Each year, people are losing billions of rupees by hosting gratuitous material (spam, malware, unsuitable adverts, spoofing etc.) and tempting naïve visitors to fall for scams. Email, adverts, web searches, or connections from other websites can all encourage people to visit these websites. Users click on the malicious URL in each instance, a trustworthy system that can categorize and identify dangerous URLs is needed due to rise in phishing, spamming, and malware occurrences. Due to the enormous amount of data, changing patterns and technologies, as well as the complex relationships between characteristics, non-availability of training data, non-linearity and the presence of outliers made classification challenging. In the proposed work, malicious URLs are detected for various applications. Dataset has been categorized into four types i.e., Phishing, Benign, Defacement and Malware. Totally 6,51,191 URLs have been used for proposed implementation. Three machine learning algorithms such as random forest, LightGBM and XGBoost were implemented to detect and classify malicious URLs.