Federated Search

IF 12.9 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Foundations and Trends in Information Retrieval Pub Date : 2011-03-05 DOI:10.1561/1500000010

Milad Shokouhi, Luo Si

{"title":"Federated Search","authors":"Milad Shokouhi, Luo Si","doi":"10.1561/1500000010","DOIUrl":null,"url":null,"abstract":"Federated search (federated information retrieval or distributed information retrieval) is a technique for searching multiple text collections simultaneously. Queries are submitted to a subset of collections that are most likely to return relevant answers. The results returned by selected collections are integrated and merged into a single list. Federated search is preferred over centralized search alternatives in many environments. For example, commercial search engines such as Google cannot easily index uncrawlable hidden web collections while federated search systems can search the contents of hidden web collections without crawling. In enterprise environments, where each organization maintains an independent search engine, federated search techniques can provide parallel search over multiple collections. \n \nThere are three major challenges in federated search. For each query, a subset of collections that are most likely to return relevant documents are selected. This creates the collection selection problem. To be able to select suitable collections, federated search systems need to acquire some knowledge about the contents of each collection, creating the collection representation problem. The results returned from the selected collections are merged before the final presentation to the user. This final step is the result merging problem. \n \nThe goal of this work, is to provide a comprehensive summary of the previous research on the federated search challenges described above.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"30 1","pages":"1-102"},"PeriodicalIF":12.9000,"publicationDate":"2011-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"167","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Foundations and Trends in Information Retrieval","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1561/1500000010","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 167

Abstract

Federated search (federated information retrieval or distributed information retrieval) is a technique for searching multiple text collections simultaneously. Queries are submitted to a subset of collections that are most likely to return relevant answers. The results returned by selected collections are integrated and merged into a single list. Federated search is preferred over centralized search alternatives in many environments. For example, commercial search engines such as Google cannot easily index uncrawlable hidden web collections while federated search systems can search the contents of hidden web collections without crawling. In enterprise environments, where each organization maintains an independent search engine, federated search techniques can provide parallel search over multiple collections. There are three major challenges in federated search. For each query, a subset of collections that are most likely to return relevant documents are selected. This creates the collection selection problem. To be able to select suitable collections, federated search systems need to acquire some knowledge about the contents of each collection, creating the collection representation problem. The results returned from the selected collections are merged before the final presentation to the user. This final step is the result merging problem. The goal of this work, is to provide a comprehensive summary of the previous research on the federated search challenges described above.

查看原文本刊更多论文

联邦搜索

联邦搜索(联邦信息检索或分布式信息检索)是一种同时搜索多个文本集合的技术。查询被提交给最有可能返回相关答案的集合子集。所选集合返回的结果被集成并合并到单个列表中。在许多环境中，联邦搜索比集中式搜索更受欢迎。例如，谷歌这样的商业搜索引擎不能很容易地索引无法抓取的隐藏web集合，而联邦搜索系统可以搜索隐藏web集合的内容而不需要抓取。在企业环境中，每个组织维护一个独立的搜索引擎，联邦搜索技术可以提供对多个集合的并行搜索。在联邦搜索中有三个主要挑战。对于每个查询，选择最有可能返回相关文档的集合子集。这就产生了集合选择问题。为了能够选择合适的集合，联邦搜索系统需要获取关于每个集合内容的一些知识，这就产生了集合表示问题。从所选集合返回的结果在最终呈现给用户之前被合并。最后一步是结果合并问题。这项工作的目标是对前面描述的联邦搜索挑战的研究提供一个全面的总结。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Foundations and Trends in Information Retrieval COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

39.10

自引率

0.00%

发文量

期刊介绍： The surge in research across all domains in the past decade has resulted in a plethora of new publications, causing an exponential growth in published research. Navigating through this extensive literature and staying current has become a time-consuming challenge. While electronic publishing provides instant access to more articles than ever, discerning the essential ones for a comprehensive understanding of any topic remains an issue. To tackle this, Foundations and Trends® in Information Retrieval - FnTIR - addresses the problem by publishing high-quality survey and tutorial monographs in the field. Each issue of Foundations and Trends® in Information Retrieval - FnT IR features a 50-100 page monograph authored by research leaders, covering tutorial subjects, research retrospectives, and survey papers that provide state-of-the-art reviews within the scope of the journal.