Web Robot Detection: A Semantic Approach

Athanasios Lagopoulos, Grigorios Tsoumakas, Georgios Papadopoulos
{"title":"Web Robot Detection: A Semantic Approach","authors":"Athanasios Lagopoulos, Grigorios Tsoumakas, Georgios Papadopoulos","doi":"10.1109/ICTAI.2018.00150","DOIUrl":null,"url":null,"abstract":"Web robots constitute nowadays more than half of the total web traffic. Malicious robots threaten the security, privacy and performance of the web, while non-malicious ones are involved in analytics skewing. The latter constitutes an important problem for large websites with unique content, as it can lead to false impressions about the popularity and impact of a piece of information. To deal with this problem, we present a novel web robot detection approach for content-rich websites, based on the assumption that human web users are interested in specific topics, while web robots crawl the web randomly. Our approach extends the typical representation of user sessions with a novel set of features that capture the semantics of the content of the requested resources. Empirical results on real-world data from the web portal of an academic publisher, show that the proposed semantic features lead to improved web robot detection accuracy.","PeriodicalId":254686,"journal":{"name":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI.2018.00150","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Web robots constitute nowadays more than half of the total web traffic. Malicious robots threaten the security, privacy and performance of the web, while non-malicious ones are involved in analytics skewing. The latter constitutes an important problem for large websites with unique content, as it can lead to false impressions about the popularity and impact of a piece of information. To deal with this problem, we present a novel web robot detection approach for content-rich websites, based on the assumption that human web users are interested in specific topics, while web robots crawl the web randomly. Our approach extends the typical representation of user sessions with a novel set of features that capture the semantics of the content of the requested resources. Empirical results on real-world data from the web portal of an academic publisher, show that the proposed semantic features lead to improved web robot detection accuracy.
网络机器人检测:一种语义方法
如今,网络机器人构成了总网络流量的一半以上。恶意机器人威胁网络的安全、隐私和性能,而非恶意机器人则涉及分析偏差。对于拥有独特内容的大型网站来说,后者构成了一个重要问题,因为它可能导致对一条信息的受欢迎程度和影响的错误印象。为了解决这一问题,我们提出了一种针对内容丰富的网站的新型网络机器人检测方法,该方法基于人类网络用户对特定主题感兴趣的假设,而网络机器人则随机抓取网络。我们的方法扩展了用户会话的典型表示,使用一组新颖的特性来捕获所请求资源内容的语义。在某学术出版社门户网站的真实数据上的实证结果表明,所提出的语义特征提高了网络机器人的检测精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信