在用于文档检索的语言模型中合并查询词依赖关系

Munirathnam Srikanth, R. Srihari
{"title":"在用于文档检索的语言模型中合并查询词依赖关系","authors":"Munirathnam Srikanth, R. Srihari","doi":"10.1145/860435.860523","DOIUrl":null,"url":null,"abstract":"Recent advances in Information Retrieval are based on using Statistical Language Models (SLM) for representing documents and evaluating their relevance to user queries [6, 3, 4]. Language Modeling (LM) has been explored in many natural language tasks including machine translation and speech recognition [1]. In LM approach to document retrieval, each document, D, is viewed to have its own language model, MD. Given a query, Q, documents are ranked based on the probability, P (Q|MD), of their language model generating the query. While the LM approach to information retrieval has been motivated from different perspectives [3, 4], most experiments have used smoothed unigram language models that assume term independence for estimating document language models. N-gram, specifically, bigram language models that capture context provided by the previous word(s) perform better than unigram models [7]. Biterm language models [8] that ignore the word order constraint in bigram language models have been shown to perform better than bigram models. However, word order constraint cannot always be relaxed since a blind venetian is not a venetian blind. Term dependencies can be measured using their co-occurrence statistics. Nallapati and Allan [5] represent term dependencies in a sentence using a maximum spanning tree and generate a sentence tree language model for the story link detection task in TDT. Syntactic parse of user queries can provide clues for when the word order constraint can be relaxed. Syn-","PeriodicalId":209809,"journal":{"name":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Incorporating query term dependencies in language models for document retrieval\",\"authors\":\"Munirathnam Srikanth, R. Srihari\",\"doi\":\"10.1145/860435.860523\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advances in Information Retrieval are based on using Statistical Language Models (SLM) for representing documents and evaluating their relevance to user queries [6, 3, 4]. Language Modeling (LM) has been explored in many natural language tasks including machine translation and speech recognition [1]. In LM approach to document retrieval, each document, D, is viewed to have its own language model, MD. Given a query, Q, documents are ranked based on the probability, P (Q|MD), of their language model generating the query. While the LM approach to information retrieval has been motivated from different perspectives [3, 4], most experiments have used smoothed unigram language models that assume term independence for estimating document language models. N-gram, specifically, bigram language models that capture context provided by the previous word(s) perform better than unigram models [7]. Biterm language models [8] that ignore the word order constraint in bigram language models have been shown to perform better than bigram models. However, word order constraint cannot always be relaxed since a blind venetian is not a venetian blind. Term dependencies can be measured using their co-occurrence statistics. Nallapati and Allan [5] represent term dependencies in a sentence using a maximum spanning tree and generate a sentence tree language model for the story link detection task in TDT. Syntactic parse of user queries can provide clues for when the word order constraint can be relaxed. Syn-\",\"PeriodicalId\":209809,\"journal\":{\"name\":\"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/860435.860523\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/860435.860523","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

摘要

信息检索的最新进展是基于使用统计语言模型(SLM)来表示文档并评估它们与用户查询的相关性[6,3,4]。语言建模(LM)已经在许多自然语言任务中进行了探索,包括机器翻译和语音识别[1]。在文档检索的LM方法中,每个文档D被视为具有自己的语言模型MD。给定一个查询Q,文档根据生成该查询的语言模型的概率P (Q|MD)进行排名。虽然LM方法的信息检索已经从不同的角度得到了激励[3,4],但大多数实验都使用平滑的一元语言模型来估计文档语言模型,该模型假设术语独立。N-gram,具体地说,捕获前一个单词提供的上下文的双字母语言模型比单字母模型表现得更好[7]。双元语言模型[8]忽略了双元语言模型中的词序约束,其表现优于双元语言模型。然而,词序约束不能总是放松,因为盲目的威尼斯人并不是威尼斯人。术语依赖性可以使用它们的共现统计来度量。Nallapati和Allan[5]使用最大生成树表示句子中的术语依赖关系,并为TDT中的故事链接检测任务生成了句子树语言模型。用户查询的语法解析可以为何时可以放松词序限制提供线索。Syn -
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Incorporating query term dependencies in language models for document retrieval
Recent advances in Information Retrieval are based on using Statistical Language Models (SLM) for representing documents and evaluating their relevance to user queries [6, 3, 4]. Language Modeling (LM) has been explored in many natural language tasks including machine translation and speech recognition [1]. In LM approach to document retrieval, each document, D, is viewed to have its own language model, MD. Given a query, Q, documents are ranked based on the probability, P (Q|MD), of their language model generating the query. While the LM approach to information retrieval has been motivated from different perspectives [3, 4], most experiments have used smoothed unigram language models that assume term independence for estimating document language models. N-gram, specifically, bigram language models that capture context provided by the previous word(s) perform better than unigram models [7]. Biterm language models [8] that ignore the word order constraint in bigram language models have been shown to perform better than bigram models. However, word order constraint cannot always be relaxed since a blind venetian is not a venetian blind. Term dependencies can be measured using their co-occurrence statistics. Nallapati and Allan [5] represent term dependencies in a sentence using a maximum spanning tree and generate a sentence tree language model for the story link detection task in TDT. Syntactic parse of user queries can provide clues for when the word order constraint can be relaxed. Syn-
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信