Classification of Malicious Web Pages through a J48 Decision Tree, aNaïve Bayes, a RBF Network and a Random Forest Classifier forWebSpam Detection

Muhammad Iqbal, Malik Muneeb Abid, U. Waheed, S. Kazmi
{"title":"Classification of Malicious Web Pages through a J48 Decision Tree, aNaïve Bayes, a RBF Network and a Random Forest Classifier forWebSpam Detection","authors":"Muhammad Iqbal, Malik Muneeb Abid, U. Waheed, S. Kazmi","doi":"10.14257/IJUNESST.2017.10.4.05","DOIUrl":null,"url":null,"abstract":"Web spam is a negative practice carried out by spammers to produce fake searchengines results for improving rank position of their Web pages. It is available on arena of World Wide Web (WWW) in different forms and lacks a consistent definition. The search engines are struggling to eliminate spam pages through machine learning (ML) detectors. Mostly, search engines measure the quality of websites by using different factors (signals) such as, number of visitors, body text, anchor text, back link and forward link etc. information and, and spammers try to induce these signals into their desired pages to subvert ranking function of search engines. This study compares the detection efficiencyof different ML classifiers trained and tested on WebSpam UK2007 data set. The results of our study show that random forest has achieve higher score than other well-known classifiers.","PeriodicalId":447068,"journal":{"name":"International Journal of u- and e- Service, Science and Technology","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of u- and e- Service, Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14257/IJUNESST.2017.10.4.05","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Web spam is a negative practice carried out by spammers to produce fake searchengines results for improving rank position of their Web pages. It is available on arena of World Wide Web (WWW) in different forms and lacks a consistent definition. The search engines are struggling to eliminate spam pages through machine learning (ML) detectors. Mostly, search engines measure the quality of websites by using different factors (signals) such as, number of visitors, body text, anchor text, back link and forward link etc. information and, and spammers try to induce these signals into their desired pages to subvert ranking function of search engines. This study compares the detection efficiencyof different ML classifiers trained and tested on WebSpam UK2007 data set. The results of our study show that random forest has achieve higher score than other well-known classifiers.
基于J48决策树、aNaïve贝叶斯、RBF网络和随机森林分类器的恶意网页分类
网络垃圾邮件是垃圾邮件发送者为提高其网页排名而制造虚假搜索结果的一种消极做法。它以不同的形式出现在万维网(WWW)的舞台上,缺乏一致的定义。搜索引擎正在努力通过机器学习(ML)检测器消除垃圾页面。大多数情况下,搜索引擎通过使用不同的因素(信号)来衡量网站的质量,如访问者数量、正文、锚文本、反向链接和转发链接等信息,垃圾邮件者试图将这些信号引入他们想要的页面,以颠覆搜索引擎的排名功能。本研究比较了在WebSpam UK2007数据集上训练和测试的不同ML分类器的检测效率。我们的研究结果表明,随机森林分类器取得了比其他已知分类器更高的分数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信