Feature Selections for the Classification of Webpages to Detect Phishing Attacks: A Survey

Mehmet Korkmaz, O. K. Sahingoz, B. Diri
{"title":"Feature Selections for the Classification of Webpages to Detect Phishing Attacks: A Survey","authors":"Mehmet Korkmaz, O. K. Sahingoz, B. Diri","doi":"10.1109/HORA49412.2020.9152934","DOIUrl":null,"url":null,"abstract":"In recent years, due to the increased number of Internet-connected devices, almost all the real-world interactions are transferred to the cyberworld. Therefore, most of the commerce (especially in the e-commerce format) are executed over webpages. The anonymous and uncontrollable structure of Internet, enables the malicious use of this cyber environment for a relatively new crime format, named as e-crime, which mainly aims some illegal financial gain by cheating the standard end-users. Phishing attacks are one of the most preferred fraudulent technique which is used for getting some confidential information (like user-id, password, credit card information, etc.) of the end-users. Therefore, security admins of the networks try to decrease the number of victims is their companies. One principal protection mechanism is the use of blacklists to detect the phishing webpages. However, it has a significant deficiency in not protection about new page attacks. Most of the security admins use some learning systems which are trained by a pre-collected a-dataset by extracting some features from the URL and content of the web pages. The performance of the used system directly related with the features used for the classification. In this work, we aimed to analyze the previously used features in the classification of the web pages by making a comparative analysis about the literature. With this study, it is aimed to produce a general survey resource for the researchers, which aim to work on the classification of webpages or the security of the networks.","PeriodicalId":166917,"journal":{"name":"2020 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HORA49412.2020.9152934","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

Abstract

In recent years, due to the increased number of Internet-connected devices, almost all the real-world interactions are transferred to the cyberworld. Therefore, most of the commerce (especially in the e-commerce format) are executed over webpages. The anonymous and uncontrollable structure of Internet, enables the malicious use of this cyber environment for a relatively new crime format, named as e-crime, which mainly aims some illegal financial gain by cheating the standard end-users. Phishing attacks are one of the most preferred fraudulent technique which is used for getting some confidential information (like user-id, password, credit card information, etc.) of the end-users. Therefore, security admins of the networks try to decrease the number of victims is their companies. One principal protection mechanism is the use of blacklists to detect the phishing webpages. However, it has a significant deficiency in not protection about new page attacks. Most of the security admins use some learning systems which are trained by a pre-collected a-dataset by extracting some features from the URL and content of the web pages. The performance of the used system directly related with the features used for the classification. In this work, we aimed to analyze the previously used features in the classification of the web pages by making a comparative analysis about the literature. With this study, it is aimed to produce a general survey resource for the researchers, which aim to work on the classification of webpages or the security of the networks.
基于特征选择的网页分类检测网络钓鱼攻击研究
近年来,由于互联网连接设备的数量增加,几乎所有现实世界的互动都转移到了网络世界。因此,大多数商务活动(尤其是电子商务形式的)都是在网页上进行的。互联网的匿名和不可控结构使得恶意利用这种网络环境进行一种相对较新的犯罪形式,即电子犯罪,其主要目的是通过欺骗标准的最终用户来获取非法经济利益。网络钓鱼攻击是最常用的欺诈技术之一,它用于获取最终用户的一些机密信息(如用户id、密码、信用卡信息等)。因此,网络安全管理员试图减少他们公司的受害者数量。一个主要的保护机制是使用黑名单来检测钓鱼网页。然而,它在不保护新页面攻击方面有明显的不足。大多数安全管理员使用一些学习系统,这些系统通过从网页的URL和内容中提取一些特征来预先收集数据集进行训练。所用系统的性能与用于分类的特征直接相关。在这项工作中,我们旨在通过对文献的比较分析来分析以前在网页分类中使用的特征。通过这项研究,其目的是为研究人员提供一般调查资源,其目的是对网页分类或网络安全进行研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信