Focusing Web Crawls On Location-Specific Content

Lefteris Kozanidis, S. Stamou, G. Spiros
{"title":"Focusing Web Crawls On Location-Specific Content","authors":"Lefteris Kozanidis, S. Stamou, G. Spiros","doi":"10.5220/0001823002440249","DOIUrl":null,"url":null,"abstract":"Retrieving relevant data for location-sensitive keyword queries is a challenging task that has so far been addressed as a problem of automatically determining the geographical orientation of web searches. Unfortunately, identifying localizable queries is not sufficient per se for performing successful location-sensitive searches, unless there exists a geo-referenced index of data sources against which localizable queries are searched. In this paper, we propose a novel approach towards the automatic construction of a geo-referenced search engine index. Our approach relies on a geo-focused crawler that incorporates a structural parser and uses GeoWordNet as a knowledge base in order to automatically deduce the geo-spatial information that is latent in the pages’ contents. Based on location-descriptive elements in the page URLs and anchor text, the crawler directs the pages to a location-sensitive downloader. This downloading module resolves the geographical references of the URL location elements and organizes them into indexable hierarchical structures. The location-aware URL hierarchies are linked to their respective pages, resulting into a georeferenced index against which location-sensitive queries can be answered.","PeriodicalId":448883,"journal":{"name":"Int. J. Web Appl.","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Web Appl.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0001823002440249","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Retrieving relevant data for location-sensitive keyword queries is a challenging task that has so far been addressed as a problem of automatically determining the geographical orientation of web searches. Unfortunately, identifying localizable queries is not sufficient per se for performing successful location-sensitive searches, unless there exists a geo-referenced index of data sources against which localizable queries are searched. In this paper, we propose a novel approach towards the automatic construction of a geo-referenced search engine index. Our approach relies on a geo-focused crawler that incorporates a structural parser and uses GeoWordNet as a knowledge base in order to automatically deduce the geo-spatial information that is latent in the pages’ contents. Based on location-descriptive elements in the page URLs and anchor text, the crawler directs the pages to a location-sensitive downloader. This downloading module resolves the geographical references of the URL location elements and organizes them into indexable hierarchical structures. The location-aware URL hierarchies are linked to their respective pages, resulting into a georeferenced index against which location-sensitive queries can be answered.
专注于特定位置内容的网络爬虫
为位置敏感关键字查询检索相关数据是一项具有挑战性的任务,迄今为止,它一直被视为自动确定网络搜索的地理方向的问题。不幸的是,识别可本地化查询本身并不足以成功地执行位置敏感搜索,除非存在数据源的地理参考索引,可本地化查询将根据该索引进行搜索。在本文中,我们提出了一种自动构建地理参考搜索引擎索引的新方法。我们的方法依赖于一个以地理为中心的爬虫,它包含一个结构解析器,并使用GeoWordNet作为知识库,以便自动推断隐藏在页面内容中的地理空间信息。基于页面url和锚文本中的位置描述元素,爬虫将页面定向到位置敏感的下载程序。这个下载模块解析URL位置元素的地理引用,并将它们组织成可索引的层次结构。位置敏感的URL层次结构被链接到它们各自的页面,从而产生一个地理引用索引,可以根据该索引回答位置敏感的查询。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信