{"title":"一种利用基于成本的特征选择检测网络钓鱼网页的新架构","authors":"A. Zangooei, V. Derhami, F. Jamshidi","doi":"10.22044/JADM.2019.7183.1852","DOIUrl":null,"url":null,"abstract":"Phishing is one of the luring techniques used to exploit personal information. A phishing webpage detection system (PWDS) extracts features to determine whether it is a phishing webpage or not. Selecting appropriate features improves the performance of PWDS. Performance criteria are detection accuracy and system response time. The major time consumed by PWDS arises from feature extraction that is considered as feature cost in this paper. Here, two novel features are proposed. They use semantic similarity measure to determine the relationship between the content and the URL of a page. Since suggested features don't apply third-party services such as search engines result, the features extraction time decreases dramatically. Login form pre-filer is utilized to reduce unnecessary calculations and false positive rate. In this paper, a cost-based feature selection is presented as the most effective feature. The selected features are employed in the suggested PWDS. Extreme learning machine algorithm is used to classify webpages. The experimental results demonstrate that suggested PWDS achieves high accuracy of 97.6% and short average detection time of 120.07 milliseconds.","PeriodicalId":32592,"journal":{"name":"Journal of Artificial Intelligence and Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Architecture for Detecting Phishing Webpages using Cost-based Feature Selection\",\"authors\":\"A. Zangooei, V. Derhami, F. Jamshidi\",\"doi\":\"10.22044/JADM.2019.7183.1852\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Phishing is one of the luring techniques used to exploit personal information. A phishing webpage detection system (PWDS) extracts features to determine whether it is a phishing webpage or not. Selecting appropriate features improves the performance of PWDS. Performance criteria are detection accuracy and system response time. The major time consumed by PWDS arises from feature extraction that is considered as feature cost in this paper. Here, two novel features are proposed. They use semantic similarity measure to determine the relationship between the content and the URL of a page. Since suggested features don't apply third-party services such as search engines result, the features extraction time decreases dramatically. Login form pre-filer is utilized to reduce unnecessary calculations and false positive rate. In this paper, a cost-based feature selection is presented as the most effective feature. The selected features are employed in the suggested PWDS. Extreme learning machine algorithm is used to classify webpages. The experimental results demonstrate that suggested PWDS achieves high accuracy of 97.6% and short average detection time of 120.07 milliseconds.\",\"PeriodicalId\":32592,\"journal\":{\"name\":\"Journal of Artificial Intelligence and Data Mining\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Artificial Intelligence and Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.22044/JADM.2019.7183.1852\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Artificial Intelligence and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22044/JADM.2019.7183.1852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Novel Architecture for Detecting Phishing Webpages using Cost-based Feature Selection
Phishing is one of the luring techniques used to exploit personal information. A phishing webpage detection system (PWDS) extracts features to determine whether it is a phishing webpage or not. Selecting appropriate features improves the performance of PWDS. Performance criteria are detection accuracy and system response time. The major time consumed by PWDS arises from feature extraction that is considered as feature cost in this paper. Here, two novel features are proposed. They use semantic similarity measure to determine the relationship between the content and the URL of a page. Since suggested features don't apply third-party services such as search engines result, the features extraction time decreases dramatically. Login form pre-filer is utilized to reduce unnecessary calculations and false positive rate. In this paper, a cost-based feature selection is presented as the most effective feature. The selected features are employed in the suggested PWDS. Extreme learning machine algorithm is used to classify webpages. The experimental results demonstrate that suggested PWDS achieves high accuracy of 97.6% and short average detection time of 120.07 milliseconds.