{"title":"在用于文档检索的语言模型中合并查询词依赖关系","authors":"Munirathnam Srikanth, R. Srihari","doi":"10.1145/860435.860523","DOIUrl":null,"url":null,"abstract":"Recent advances in Information Retrieval are based on using Statistical Language Models (SLM) for representing documents and evaluating their relevance to user queries [6, 3, 4]. Language Modeling (LM) has been explored in many natural language tasks including machine translation and speech recognition [1]. In LM approach to document retrieval, each document, D, is viewed to have its own language model, MD. Given a query, Q, documents are ranked based on the probability, P (Q|MD), of their language model generating the query. While the LM approach to information retrieval has been motivated from different perspectives [3, 4], most experiments have used smoothed unigram language models that assume term independence for estimating document language models. N-gram, specifically, bigram language models that capture context provided by the previous word(s) perform better than unigram models [7]. Biterm language models [8] that ignore the word order constraint in bigram language models have been shown to perform better than bigram models. However, word order constraint cannot always be relaxed since a blind venetian is not a venetian blind. Term dependencies can be measured using their co-occurrence statistics. Nallapati and Allan [5] represent term dependencies in a sentence using a maximum spanning tree and generate a sentence tree language model for the story link detection task in TDT. Syntactic parse of user queries can provide clues for when the word order constraint can be relaxed. Syn-","PeriodicalId":209809,"journal":{"name":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Incorporating query term dependencies in language models for document retrieval\",\"authors\":\"Munirathnam Srikanth, R. Srihari\",\"doi\":\"10.1145/860435.860523\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advances in Information Retrieval are based on using Statistical Language Models (SLM) for representing documents and evaluating their relevance to user queries [6, 3, 4]. Language Modeling (LM) has been explored in many natural language tasks including machine translation and speech recognition [1]. In LM approach to document retrieval, each document, D, is viewed to have its own language model, MD. Given a query, Q, documents are ranked based on the probability, P (Q|MD), of their language model generating the query. While the LM approach to information retrieval has been motivated from different perspectives [3, 4], most experiments have used smoothed unigram language models that assume term independence for estimating document language models. N-gram, specifically, bigram language models that capture context provided by the previous word(s) perform better than unigram models [7]. Biterm language models [8] that ignore the word order constraint in bigram language models have been shown to perform better than bigram models. However, word order constraint cannot always be relaxed since a blind venetian is not a venetian blind. Term dependencies can be measured using their co-occurrence statistics. Nallapati and Allan [5] represent term dependencies in a sentence using a maximum spanning tree and generate a sentence tree language model for the story link detection task in TDT. Syntactic parse of user queries can provide clues for when the word order constraint can be relaxed. Syn-\",\"PeriodicalId\":209809,\"journal\":{\"name\":\"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/860435.860523\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/860435.860523","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Incorporating query term dependencies in language models for document retrieval
Recent advances in Information Retrieval are based on using Statistical Language Models (SLM) for representing documents and evaluating their relevance to user queries [6, 3, 4]. Language Modeling (LM) has been explored in many natural language tasks including machine translation and speech recognition [1]. In LM approach to document retrieval, each document, D, is viewed to have its own language model, MD. Given a query, Q, documents are ranked based on the probability, P (Q|MD), of their language model generating the query. While the LM approach to information retrieval has been motivated from different perspectives [3, 4], most experiments have used smoothed unigram language models that assume term independence for estimating document language models. N-gram, specifically, bigram language models that capture context provided by the previous word(s) perform better than unigram models [7]. Biterm language models [8] that ignore the word order constraint in bigram language models have been shown to perform better than bigram models. However, word order constraint cannot always be relaxed since a blind venetian is not a venetian blind. Term dependencies can be measured using their co-occurrence statistics. Nallapati and Allan [5] represent term dependencies in a sentence using a maximum spanning tree and generate a sentence tree language model for the story link detection task in TDT. Syntactic parse of user queries can provide clues for when the word order constraint can be relaxed. Syn-