Aditya Kulkarni, Vivek Balachandran, D. Divakaran, Tamal Das
{"title":"减少用于检测网络钓鱼网页的机器学习模型中的偏差","authors":"Aditya Kulkarni, Vivek Balachandran, D. Divakaran, Tamal Das","doi":"10.1109/COMSNETS59351.2024.10427170","DOIUrl":null,"url":null,"abstract":"The widespread accessibility of the Internet has led to a surge in online fraudulent activities, underscoring the necessity of shielding users' sensitive information from cybercriminals. Phishing, a well-known cyberattack, revolves around the creation of phishing webpages and the dissemination of corresponding URLs, aiming to deceive users into sharing their sensitive information, often for identity theft or financial gain. Various techniques are available for preemptively categorizing zero-day phishing URLs by distilling unique attributes and constructing predictive models. However, these existing techniques encounter unresolved issues. This proposal delves into persistent challenges within phishing detection solutions, particularly concentrated on the preliminary phase of assembling comprehensive datasets, and proposes a potential solution in the form of a tool engineered to alleviate bias in ML models. Such a tool can generate phishing webpages for any given set of legitimate URLs, infusing randomly selected content and visual-based phishing features. Furthermore, we contend that the tool holds the potential to assess the efficacy of existing phishing detection solutions, especially those trained on confined datasets.","PeriodicalId":518748,"journal":{"name":"2024 16th International Conference on COMmunication Systems & NETworkS (COMSNETS)","volume":"290 1","pages":"430-432"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Mitigating Bias in Machine Learning Models for Phishing Webpage Detection\",\"authors\":\"Aditya Kulkarni, Vivek Balachandran, D. Divakaran, Tamal Das\",\"doi\":\"10.1109/COMSNETS59351.2024.10427170\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The widespread accessibility of the Internet has led to a surge in online fraudulent activities, underscoring the necessity of shielding users' sensitive information from cybercriminals. Phishing, a well-known cyberattack, revolves around the creation of phishing webpages and the dissemination of corresponding URLs, aiming to deceive users into sharing their sensitive information, often for identity theft or financial gain. Various techniques are available for preemptively categorizing zero-day phishing URLs by distilling unique attributes and constructing predictive models. However, these existing techniques encounter unresolved issues. This proposal delves into persistent challenges within phishing detection solutions, particularly concentrated on the preliminary phase of assembling comprehensive datasets, and proposes a potential solution in the form of a tool engineered to alleviate bias in ML models. Such a tool can generate phishing webpages for any given set of legitimate URLs, infusing randomly selected content and visual-based phishing features. Furthermore, we contend that the tool holds the potential to assess the efficacy of existing phishing detection solutions, especially those trained on confined datasets.\",\"PeriodicalId\":518748,\"journal\":{\"name\":\"2024 16th International Conference on COMmunication Systems & NETworkS (COMSNETS)\",\"volume\":\"290 1\",\"pages\":\"430-432\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2024 16th International Conference on COMmunication Systems & NETworkS (COMSNETS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMSNETS59351.2024.10427170\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 16th International Conference on COMmunication Systems & NETworkS (COMSNETS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMSNETS59351.2024.10427170","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
互联网的普及导致网上欺诈活动激增,突出表明了保护用户敏感信息不受网络犯罪分子侵害的必要性。网络钓鱼是一种众所周知的网络攻击,围绕着创建网络钓鱼网页和传播相应的 URL 而展开,目的是欺骗用户分享其敏感信息,通常是为了盗用身份或获取经济利益。通过提炼独特的属性和构建预测模型,有多种技术可用于对零日网络钓鱼 URL 进行预先分类。然而,这些现有技术都遇到了尚未解决的问题。本提案深入探讨了网络钓鱼检测解决方案中持续存在的挑战,尤其是集中在收集综合数据集的初步阶段,并提出了一种潜在的解决方案,即设计一种工具来减轻 ML 模型中的偏差。这种工具可以为任何给定的合法 URL 集生成网络钓鱼网页,并注入随机选择的内容和基于视觉的网络钓鱼特征。此外,我们还认为该工具具有评估现有网络钓鱼检测解决方案有效性的潜力,尤其是那些在有限数据集上训练的解决方案。
Mitigating Bias in Machine Learning Models for Phishing Webpage Detection
The widespread accessibility of the Internet has led to a surge in online fraudulent activities, underscoring the necessity of shielding users' sensitive information from cybercriminals. Phishing, a well-known cyberattack, revolves around the creation of phishing webpages and the dissemination of corresponding URLs, aiming to deceive users into sharing their sensitive information, often for identity theft or financial gain. Various techniques are available for preemptively categorizing zero-day phishing URLs by distilling unique attributes and constructing predictive models. However, these existing techniques encounter unresolved issues. This proposal delves into persistent challenges within phishing detection solutions, particularly concentrated on the preliminary phase of assembling comprehensive datasets, and proposes a potential solution in the form of a tool engineered to alleviate bias in ML models. Such a tool can generate phishing webpages for any given set of legitimate URLs, infusing randomly selected content and visual-based phishing features. Furthermore, we contend that the tool holds the potential to assess the efficacy of existing phishing detection solutions, especially those trained on confined datasets.