Self-supervised web search for any-k complete tuples

BEWEB '11 Pub Date : 2011-03-25 DOI:10.1145/1966883.1966889

Alexander Löser, Christoph Nagel, Stephan Pieper, Christoph Boden

{"title":"Self-supervised web search for any-k complete tuples","authors":"Alexander Löser, Christoph Nagel, Stephan Pieper, Christoph Boden","doi":"10.1145/1966883.1966889","DOIUrl":null,"url":null,"abstract":"A common task of Web users is querying structured information from Web pages. In this paper we propose a novel query processor for systematically discovering any-k relations from Web search results with conjunctive queries. The 'any-k' phrase denotes that retrieved tuples are not ranked by the system.\n For realizing this interesting scenario the query processor transfers a structured query into keyword queries that are submitted to a search engine, forwards search results to relation extractors, and then combines relations into result tuples.\n Unfortunately, relation extractors may fail to return a relation for a result tuple. We propose a solid information theory-based approach for retrieving missing attribute values of partially retrieved relations. Moreover, user-defined data sources may not return at least k complete result tuples. To solve this problem, we extend the Eddy query processing mechanism [14] for our 'querying the Web' scenario with a continuous, adaptive routing model. The model determines the most promising next incomplete row for returning any-k complete result tuples at any point during the query execution process.\n We report a thorough experimental evaluation over multiple relation extractors. Our experiments demonstrate that our query processor returns complete result tuples while processing only very few Web pages.","PeriodicalId":238578,"journal":{"name":"BEWEB '11","volume":"89 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BEWEB '11","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1966883.1966889","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

A common task of Web users is querying structured information from Web pages. In this paper we propose a novel query processor for systematically discovering any-k relations from Web search results with conjunctive queries. The 'any-k' phrase denotes that retrieved tuples are not ranked by the system. For realizing this interesting scenario the query processor transfers a structured query into keyword queries that are submitted to a search engine, forwards search results to relation extractors, and then combines relations into result tuples. Unfortunately, relation extractors may fail to return a relation for a result tuple. We propose a solid information theory-based approach for retrieving missing attribute values of partially retrieved relations. Moreover, user-defined data sources may not return at least k complete result tuples. To solve this problem, we extend the Eddy query processing mechanism [14] for our 'querying the Web' scenario with a continuous, adaptive routing model. The model determines the most promising next incomplete row for returning any-k complete result tuples at any point during the query execution process. We report a thorough experimental evaluation over multiple relation extractors. Our experiments demonstrate that our query processor returns complete result tuples while processing only very few Web pages.

查看原文本刊更多论文

自监督网络搜索任意k个完整元组

Web用户的一个常见任务是从Web页面查询结构化信息。本文提出了一种新的查询处理器，用于系统地从带有连接查询的Web搜索结果中发现任意k关系。'any-k'短语表示检索到的元组没有被系统排序。为了实现这个有趣的场景，查询处理器将结构化查询转换为提交给搜索引擎的关键字查询，将搜索结果转发给关系提取器，然后将关系组合成结果元组。不幸的是，关系提取器可能无法返回结果元组的关系。我们提出了一种可靠的基于信息理论的方法来检索部分检索关系中缺失的属性值。此外，用户定义的数据源可能不会返回至少k个完整的结果元组。为了解决这个问题，我们扩展了涡流查询处理机制[14]，为我们的“查询Web”场景提供了一个连续的、自适应的路由模型。在查询执行过程的任何时刻，该模型确定最有希望返回任意k个完整结果元组的下一个不完整行。我们报告了对多个关系提取器的全面实验评估。我们的实验表明，查询处理器只处理很少的Web页面，却返回完整的结果元组。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BEWEB '11

自引率

0.00%

发文量