概率关联框架下的泛化翻译模型

Proceedings of the 25th ACM International on Conference on Information and Knowledge Management Pub Date : 2016-10-24 DOI:10.1145/2983323.2983833

Navid Rekabsaz, M. Lupu, A. Hanbury, G. Zuccon

{"title":"概率关联框架下的泛化翻译模型","authors":"Navid Rekabsaz, M. Lupu, A. Hanbury, G. Zuccon","doi":"10.1145/2983323.2983833","DOIUrl":null,"url":null,"abstract":"A recurring question in information retrieval is whether term associations can be properly integrated in traditional information retrieval models while preserving their robustness and effectiveness. In this paper, we revisit a wide spectrum of existing models (Pivoted Document Normalization, BM25, BM25 Verboseness Aware, Multi-Aspect TF, and Language Modelling) by introducing a generalisation of the idea of the translation model. This generalisation is a de facto transformation of the translation models from Language Modelling to the probabilistic models. In doing so, we observe a potential limitation of these generalised translation models: they only affect the term frequency based components of all the models, ignoring changes in document and collection statistics. We correct this limitation by extending the translation models with the 15 statistics of term associations and provide extensive experimental results to demonstrate the benefit of the newly proposed methods. Additionally, we compare the translation models with query expansion methods based on the same term association resources, as well as based on Pseudo-Relevance Feedback (PRF). We observe that translation models always outperform the first, but provide complementary information with the second, such that by using PRF and our translation models together we observe results better than the current state of the art.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"117 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"Generalizing Translation Models in the Probabilistic Relevance Framework\",\"authors\":\"Navid Rekabsaz, M. Lupu, A. Hanbury, G. Zuccon\",\"doi\":\"10.1145/2983323.2983833\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A recurring question in information retrieval is whether term associations can be properly integrated in traditional information retrieval models while preserving their robustness and effectiveness. In this paper, we revisit a wide spectrum of existing models (Pivoted Document Normalization, BM25, BM25 Verboseness Aware, Multi-Aspect TF, and Language Modelling) by introducing a generalisation of the idea of the translation model. This generalisation is a de facto transformation of the translation models from Language Modelling to the probabilistic models. In doing so, we observe a potential limitation of these generalised translation models: they only affect the term frequency based components of all the models, ignoring changes in document and collection statistics. We correct this limitation by extending the translation models with the 15 statistics of term associations and provide extensive experimental results to demonstrate the benefit of the newly proposed methods. Additionally, we compare the translation models with query expansion methods based on the same term association resources, as well as based on Pseudo-Relevance Feedback (PRF). We observe that translation models always outperform the first, but provide complementary information with the second, such that by using PRF and our translation models together we observe results better than the current state of the art.\",\"PeriodicalId\":250808,\"journal\":{\"name\":\"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management\",\"volume\":\"117 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2983323.2983833\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2983323.2983833","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

摘要

在信息检索中，一个反复出现的问题是，术语关联能否在保持传统信息检索模型鲁棒性和有效性的同时，适当地集成到传统的信息检索模型中。在本文中，我们通过引入翻译模型思想的概括，重新审视了广泛的现有模型(pivot Document Normalization, BM25, BM25 verbose Aware, Multi-Aspect TF和语言建模)。这种概括实际上是翻译模型从语言模型到概率模型的转换。在这样做的过程中，我们观察到这些泛化翻译模型的一个潜在限制:它们只影响所有模型中基于术语频率的成分，忽略了文档和集合统计的变化。我们通过扩展具有15个术语关联统计的翻译模型来纠正这一限制，并提供了广泛的实验结果来证明新提出的方法的好处。此外，我们将翻译模型与基于相同术语关联资源的查询扩展方法以及基于伪相关反馈(PRF)的查询扩展方法进行了比较。我们观察到，翻译模型总是优于第一种模型，但提供了与第二种模型互补的信息，因此，通过将PRF和我们的翻译模型一起使用，我们观察到的结果比目前的技术水平更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Generalizing Translation Models in the Probabilistic Relevance Framework

A recurring question in information retrieval is whether term associations can be properly integrated in traditional information retrieval models while preserving their robustness and effectiveness. In this paper, we revisit a wide spectrum of existing models (Pivoted Document Normalization, BM25, BM25 Verboseness Aware, Multi-Aspect TF, and Language Modelling) by introducing a generalisation of the idea of the translation model. This generalisation is a de facto transformation of the translation models from Language Modelling to the probabilistic models. In doing so, we observe a potential limitation of these generalised translation models: they only affect the term frequency based components of all the models, ignoring changes in document and collection statistics. We correct this limitation by extending the translation models with the 15 statistics of term associations and provide extensive experimental results to demonstrate the benefit of the newly proposed methods. Additionally, we compare the translation models with query expansion methods based on the same term association resources, as well as based on Pseudo-Relevance Feedback (PRF). We observe that translation models always outperform the first, but provide complementary information with the second, such that by using PRF and our translation models together we observe results better than the current state of the art.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

自引率

0.00%

发文量