{"title":"Modeling click-through based word-pairs for web search","authors":"Jagadeesh Jagarlamudi, Jianfeng Gao","doi":"10.1145/2484028.2484082","DOIUrl":null,"url":null,"abstract":"Statistical translation models and latent semantic analysis (LSA) are two effective approaches to exploiting click-through data for Web search ranking. While the former learns semantic relationships between query terms and document terms directly, the latter maps a document and the queries for which it has been clicked to vectors in a lower dimensional semantic space. This paper presents two document ranking models that combine the strengths of both the approaches by explicitly modeling word-pairs. The first model, called PairModel, is a monolingual ranking model based on word-pairs derived from click-through data. It maps queries and documents into a concept space spanned by these word-pairs. The second model, called Bilingual Paired Topic Model (BPTM), uses bilingual word translations and can jointly model query-document collections written in multiple languages. This model uses topics to capture term dependencies and maps queries and documents in multiple languages into a lower dimensional semantic sub-space spanned by the topics. These models are evaluated on the Web search task using real world data sets in three different languages. Results show that they consistently outperform various state-of-the-art baseline models, and the best result is obtained by interpolating PairModel and BPTM.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2484028.2484082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Statistical translation models and latent semantic analysis (LSA) are two effective approaches to exploiting click-through data for Web search ranking. While the former learns semantic relationships between query terms and document terms directly, the latter maps a document and the queries for which it has been clicked to vectors in a lower dimensional semantic space. This paper presents two document ranking models that combine the strengths of both the approaches by explicitly modeling word-pairs. The first model, called PairModel, is a monolingual ranking model based on word-pairs derived from click-through data. It maps queries and documents into a concept space spanned by these word-pairs. The second model, called Bilingual Paired Topic Model (BPTM), uses bilingual word translations and can jointly model query-document collections written in multiple languages. This model uses topics to capture term dependencies and maps queries and documents in multiple languages into a lower dimensional semantic sub-space spanned by the topics. These models are evaluated on the Web search task using real world data sets in three different languages. Results show that they consistently outperform various state-of-the-art baseline models, and the best result is obtained by interpolating PairModel and BPTM.