{"title":"Automatic Detection of Malicious URLs using Fine-Tuned Classification Model","authors":"Chiyu Ding","doi":"10.1109/ISCTT51595.2020.00060","DOIUrl":null,"url":null,"abstract":"URLs are frequently used to surf the Internet in modern society. Especially after the outbreak of the COVID-19, quarantine makes needs for the Internet and usages of URLs reach an unprecedented level. Unfortunately, not every URL is believable, because some of them can attack your computer, steal your personal information, and even spread computer virus like Trojan. This project designs machine learning algorithms to detect malicious URLs efficiently and protect Internet users from malicious URLs. In this project, the goal is to get a fine-tuned machine learning model, I will first introduce the dataset of URLs which is crucial to train the model. Then I will show the procedure and some findings of my data exploration. After that, I will present the methods including the algorithms and some improvements I make to optimize my model. Finally, I will show the results and the conclusion. To make my models better, I not only apply hyperparameter tuning, but also data resampling and cross validation to the model. The procedure is repeated several times to ensure the stability. In order to evaluate the performance of my models accurately, I adopt multiple methods. After improving the algorithms, by using the F1 score to evaluate performance, the result boosts significantly from original 0.14 to around 0.90. With the ultimate well-trained model, we can predict the safety of all the URLs on the Internet accurately, which can secure our personal information and data.","PeriodicalId":178054,"journal":{"name":"2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCTT51595.2020.00060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
URLs are frequently used to surf the Internet in modern society. Especially after the outbreak of the COVID-19, quarantine makes needs for the Internet and usages of URLs reach an unprecedented level. Unfortunately, not every URL is believable, because some of them can attack your computer, steal your personal information, and even spread computer virus like Trojan. This project designs machine learning algorithms to detect malicious URLs efficiently and protect Internet users from malicious URLs. In this project, the goal is to get a fine-tuned machine learning model, I will first introduce the dataset of URLs which is crucial to train the model. Then I will show the procedure and some findings of my data exploration. After that, I will present the methods including the algorithms and some improvements I make to optimize my model. Finally, I will show the results and the conclusion. To make my models better, I not only apply hyperparameter tuning, but also data resampling and cross validation to the model. The procedure is repeated several times to ensure the stability. In order to evaluate the performance of my models accurately, I adopt multiple methods. After improving the algorithms, by using the F1 score to evaluate performance, the result boosts significantly from original 0.14 to around 0.90. With the ultimate well-trained model, we can predict the safety of all the URLs on the Internet accurately, which can secure our personal information and data.