Pseudo Descriptions for Meta-Data Retrieval

Tim Gollub, E. Genc, Nedim Lipka, Benno Stein
{"title":"Pseudo Descriptions for Meta-Data Retrieval","authors":"Tim Gollub, E. Genc, Nedim Lipka, Benno Stein","doi":"10.1145/3234944.3234957","DOIUrl":null,"url":null,"abstract":"Search in meta-data is challenging due to the sparsity of the available textual information. To alleviate the sparsity problem, the paper in hand evolves from the existing document expansion paradigm and proposes pseudo-descriptions as a new paradigm. Instead of encoding paradigmatic term relations implicitly in an expansion vector, we generate an explicit cohesive text field for meta-data records that describes the entity associated with the record. In contrast to document expansions, pseudo-descriptions allow to reveal why a certain document is considered relevant although the original meta-data does not contain the query terms. Moreover, they are easier to operationalize and facilitate the use of sophisticated retrieval features such as phrase search and query term proximity. To generate pseudo-descriptions, we propose a relevance dependent strategy that depends on the search engine result pages obtained from issuing the meta-data as a search query to a designated reference search engine. To demonstrate the validity of the pseudo-description paradigm, we experiment with different TREC collections where we withhold the content information to simulate a meta-data retrieval scenario. Though retrieval with full content information remains superior, our approach achieves retrieval performance improvements en par with document expansion.","PeriodicalId":193631,"journal":{"name":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3234944.3234957","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Search in meta-data is challenging due to the sparsity of the available textual information. To alleviate the sparsity problem, the paper in hand evolves from the existing document expansion paradigm and proposes pseudo-descriptions as a new paradigm. Instead of encoding paradigmatic term relations implicitly in an expansion vector, we generate an explicit cohesive text field for meta-data records that describes the entity associated with the record. In contrast to document expansions, pseudo-descriptions allow to reveal why a certain document is considered relevant although the original meta-data does not contain the query terms. Moreover, they are easier to operationalize and facilitate the use of sophisticated retrieval features such as phrase search and query term proximity. To generate pseudo-descriptions, we propose a relevance dependent strategy that depends on the search engine result pages obtained from issuing the meta-data as a search query to a designated reference search engine. To demonstrate the validity of the pseudo-description paradigm, we experiment with different TREC collections where we withhold the content information to simulate a meta-data retrieval scenario. Though retrieval with full content information remains superior, our approach achieves retrieval performance improvements en par with document expansion.
元数据检索的伪描述
由于可用文本信息的稀疏性,元数据中的搜索具有挑战性。为了缓解稀疏性问题,本文在现有文档扩展范式的基础上,提出了伪描述作为一种新的扩展范式。我们没有在展开向量中隐式地编码聚合术语关系,而是为元数据记录生成一个显式内聚文本字段,该字段描述与记录关联的实体。与文档展开相反,伪描述允许揭示为什么某些文档被认为是相关的,尽管原始元数据不包含查询条件。此外,它们更容易操作,便于使用复杂的检索特性,如短语搜索和查询词接近度。为了生成伪描述,我们提出了一种相关性依赖策略,该策略依赖于通过向指定参考搜索引擎发出元数据作为搜索查询而获得的搜索引擎结果页面。为了证明伪描述范式的有效性,我们对不同的TREC集合进行了实验,其中我们保留了内容信息以模拟元数据检索场景。虽然完整内容信息的检索仍然是优越的,但我们的方法实现了与文档扩展相同的检索性能改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信