基于内容的随机漫步算法的垃圾邮件检测

SMUC '10 Pub Date : 2010-10-30 DOI:10.1145/1871985.1871994
F. Javier Ortega, C. Macdonald, J. A. Troyano, Fermín L. Cruz
{"title":"基于内容的随机漫步算法的垃圾邮件检测","authors":"F. Javier Ortega, C. Macdonald, J. A. Troyano, Fermín L. Cruz","doi":"10.1145/1871985.1871994","DOIUrl":null,"url":null,"abstract":"In this work we tackle the problem of the spam detection on the Web. Spam web pages have become a problem for Web search engines, due to the negative effects that this phenomenon can cause in their retrieval results. Our approach is based on a random-walk algorithm that obtains a ranking of pages according to their relevance and their spam likelihood. We introduce the novelty of taking into account the content of the web pages to characterize the web graph and to obtain an a-priori estimation of the spam likekihood of the web pages. Our graph-based algorithm computes two scores for each node in the graph. Intuitively, these values represent how bad or good (spam-like or not) is a web page, according to its textual content and the relations in the graph. Our experiments show that our proposed technique outperforms other link-based techniques for spam detection.","PeriodicalId":244822,"journal":{"name":"SMUC '10","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Spam detection with a content-based random-walk algorithm\",\"authors\":\"F. Javier Ortega, C. Macdonald, J. A. Troyano, Fermín L. Cruz\",\"doi\":\"10.1145/1871985.1871994\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work we tackle the problem of the spam detection on the Web. Spam web pages have become a problem for Web search engines, due to the negative effects that this phenomenon can cause in their retrieval results. Our approach is based on a random-walk algorithm that obtains a ranking of pages according to their relevance and their spam likelihood. We introduce the novelty of taking into account the content of the web pages to characterize the web graph and to obtain an a-priori estimation of the spam likekihood of the web pages. Our graph-based algorithm computes two scores for each node in the graph. Intuitively, these values represent how bad or good (spam-like or not) is a web page, according to its textual content and the relations in the graph. Our experiments show that our proposed technique outperforms other link-based techniques for spam detection.\",\"PeriodicalId\":244822,\"journal\":{\"name\":\"SMUC '10\",\"volume\":\"85 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SMUC '10\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1871985.1871994\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SMUC '10","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1871985.1871994","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

摘要

在这项工作中,我们解决了Web上的垃圾邮件检测问题。垃圾网页已经成为web搜索引擎的一个问题,因为这种现象会对它们的检索结果造成负面影响。我们的方法基于随机游走算法,该算法根据页面的相关性和垃圾邮件的可能性获得页面的排名。我们引入了考虑网页内容的新颖性来表征网页图,并获得网页垃圾邮件可能性的先验估计。我们的基于图的算法为图中的每个节点计算两个分数。直观地说,根据文本内容和图中的关系,这些值表示网页的好坏(像垃圾邮件或不像垃圾邮件)。我们的实验表明,我们提出的技术优于其他基于链接的垃圾邮件检测技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Spam detection with a content-based random-walk algorithm
In this work we tackle the problem of the spam detection on the Web. Spam web pages have become a problem for Web search engines, due to the negative effects that this phenomenon can cause in their retrieval results. Our approach is based on a random-walk algorithm that obtains a ranking of pages according to their relevance and their spam likelihood. We introduce the novelty of taking into account the content of the web pages to characterize the web graph and to obtain an a-priori estimation of the spam likekihood of the web pages. Our graph-based algorithm computes two scores for each node in the graph. Intuitively, these values represent how bad or good (spam-like or not) is a web page, according to its textual content and the relations in the graph. Our experiments show that our proposed technique outperforms other link-based techniques for spam detection.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信