基于聚类和贝叶斯方法的网络钓鱼站点检测混合模型

International Conference for Convergence for Technology-2014 Pub Date : 2014-04-06 DOI:10.1109/I2CT.2014.7092141

Rahul Patil, Bhushan Dasharath Dhamdhere, Kaushal Sudhakar Dhonde, Rohit Gopal Chinchwade, Swapnil Balasaheb Mehetre

{"title":"基于聚类和贝叶斯方法的网络钓鱼站点检测混合模型","authors":"Rahul Patil, Bhushan Dasharath Dhamdhere, Kaushal Sudhakar Dhonde, Rohit Gopal Chinchwade, Swapnil Balasaheb Mehetre","doi":"10.1109/I2CT.2014.7092141","DOIUrl":null,"url":null,"abstract":"Phishing sites are the major attacks by which most of internet users are being fooled by the phisher. The replicas of the legitimate sites are created and users are directed to that web site by luring some offers to it. There are certain standards which are given by W3C (World Wide Web Consortium), based on these standards we are choosing some features which can easily describe the difference between legit site and phish site. We are proposing a model to determine the phishing sites to safeguard the web users from phisher. The features of URL along with the features of Web Page in HTML tags are considered to determine the attack. Here Clustering of Database is done through K-Means Clustering and Naive Bayes Classifier prediction technique is applied to determine the probability of the web site as Valid Phish or Invalid Phish. K-Means Clustering is applied on initial URL features and Validity is checked if still we are not able to determine the Validity of Web Site then Naive Bayes Classifier is applied onto URL as well as HTML tag features of Site and probability is evaluated based on training model.","PeriodicalId":384966,"journal":{"name":"International Conference for Convergence for Technology-2014","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"A hybrid model to detect phishing-sites using clustering and Bayesian approach\",\"authors\":\"Rahul Patil, Bhushan Dasharath Dhamdhere, Kaushal Sudhakar Dhonde, Rohit Gopal Chinchwade, Swapnil Balasaheb Mehetre\",\"doi\":\"10.1109/I2CT.2014.7092141\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Phishing sites are the major attacks by which most of internet users are being fooled by the phisher. The replicas of the legitimate sites are created and users are directed to that web site by luring some offers to it. There are certain standards which are given by W3C (World Wide Web Consortium), based on these standards we are choosing some features which can easily describe the difference between legit site and phish site. We are proposing a model to determine the phishing sites to safeguard the web users from phisher. The features of URL along with the features of Web Page in HTML tags are considered to determine the attack. Here Clustering of Database is done through K-Means Clustering and Naive Bayes Classifier prediction technique is applied to determine the probability of the web site as Valid Phish or Invalid Phish. K-Means Clustering is applied on initial URL features and Validity is checked if still we are not able to determine the Validity of Web Site then Naive Bayes Classifier is applied onto URL as well as HTML tag features of Site and probability is evaluated based on training model.\",\"PeriodicalId\":384966,\"journal\":{\"name\":\"International Conference for Convergence for Technology-2014\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-04-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference for Convergence for Technology-2014\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/I2CT.2014.7092141\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference for Convergence for Technology-2014","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/I2CT.2014.7092141","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

网络钓鱼网站是大多数互联网用户被网络钓鱼者欺骗的主要攻击方式。合法网站的复制品被创建，用户被引导到该网站通过引诱一些报价。W3C(万维网联盟)给出了一定的标准，基于这些标准，我们选择了一些特征，可以很容易地描述合法网站和钓鱼网站之间的区别。我们提出了一个确定网络钓鱼站点的模型，以保护网络用户免受网络钓鱼者的攻击。考虑了URL的特征以及HTML标签中Web Page的特征来判断攻击。这里通过K-Means聚类对数据库进行聚类，并应用朴素贝叶斯分类器预测技术来确定网站为Valid Phish或Invalid Phish的概率。对初始URL特征进行k均值聚类，如果仍然不能确定网站的有效性，则对URL和网站的HTML标签特征进行朴素贝叶斯分类器，并基于训练模型评估概率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A hybrid model to detect phishing-sites using clustering and Bayesian approach

Phishing sites are the major attacks by which most of internet users are being fooled by the phisher. The replicas of the legitimate sites are created and users are directed to that web site by luring some offers to it. There are certain standards which are given by W3C (World Wide Web Consortium), based on these standards we are choosing some features which can easily describe the difference between legit site and phish site. We are proposing a model to determine the phishing sites to safeguard the web users from phisher. The features of URL along with the features of Web Page in HTML tags are considered to determine the attack. Here Clustering of Database is done through K-Means Clustering and Naive Bayes Classifier prediction technique is applied to determine the probability of the web site as Valid Phish or Invalid Phish. K-Means Clustering is applied on initial URL features and Validity is checked if still we are not able to determine the Validity of Web Site then Naive Bayes Classifier is applied onto URL as well as HTML tag features of Site and probability is evaluated based on training model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Conference for Convergence for Technology-2014

自引率

0.00%

发文量