Modeling click-through based word-pairs for web search

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval Pub Date : 2013-07-28 DOI:10.1145/2484028.2484082

Jagadeesh Jagarlamudi, Jianfeng Gao

{"title":"Modeling click-through based word-pairs for web search","authors":"Jagadeesh Jagarlamudi, Jianfeng Gao","doi":"10.1145/2484028.2484082","DOIUrl":null,"url":null,"abstract":"Statistical translation models and latent semantic analysis (LSA) are two effective approaches to exploiting click-through data for Web search ranking. While the former learns semantic relationships between query terms and document terms directly, the latter maps a document and the queries for which it has been clicked to vectors in a lower dimensional semantic space. This paper presents two document ranking models that combine the strengths of both the approaches by explicitly modeling word-pairs. The first model, called PairModel, is a monolingual ranking model based on word-pairs derived from click-through data. It maps queries and documents into a concept space spanned by these word-pairs. The second model, called Bilingual Paired Topic Model (BPTM), uses bilingual word translations and can jointly model query-document collections written in multiple languages. This model uses topics to capture term dependencies and maps queries and documents in multiple languages into a lower dimensional semantic sub-space spanned by the topics. These models are evaluated on the Web search task using real world data sets in three different languages. Results show that they consistently outperform various state-of-the-art baseline models, and the best result is obtained by interpolating PairModel and BPTM.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2484028.2484082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Statistical translation models and latent semantic analysis (LSA) are two effective approaches to exploiting click-through data for Web search ranking. While the former learns semantic relationships between query terms and document terms directly, the latter maps a document and the queries for which it has been clicked to vectors in a lower dimensional semantic space. This paper presents two document ranking models that combine the strengths of both the approaches by explicitly modeling word-pairs. The first model, called PairModel, is a monolingual ranking model based on word-pairs derived from click-through data. It maps queries and documents into a concept space spanned by these word-pairs. The second model, called Bilingual Paired Topic Model (BPTM), uses bilingual word translations and can jointly model query-document collections written in multiple languages. This model uses topics to capture term dependencies and maps queries and documents in multiple languages into a lower dimensional semantic sub-space spanned by the topics. These models are evaluated on the Web search task using real world data sets in three different languages. Results show that they consistently outperform various state-of-the-art baseline models, and the best result is obtained by interpolating PairModel and BPTM.

查看原文本刊更多论文

为网络搜索建模基于点击的词对

统计翻译模型和潜在语义分析(LSA)是利用点击率数据进行网页搜索排名的两种有效方法。前者直接学习查询词和文档词之间的语义关系，而后者将文档和被点击的查询映射到较低维语义空间中的向量。本文提出了两个文档排序模型，通过显式地对词对建模，结合了这两种方法的优势。第一个模型名为PairModel，它是一个单语言排序模型，基于从点击量数据中获得的词对。它将查询和文档映射到由这些词对跨越的概念空间中。第二个模型称为双语配对主题模型(BPTM)，它使用双语单词翻译，可以联合建模用多种语言编写的查询文档集合。该模型使用主题捕获术语依赖关系，并将多种语言的查询和文档映射到由主题跨越的较低维度语义子空间中。这些模型使用三种不同语言的真实世界数据集在Web搜索任务上进行评估。结果表明，它们的性能始终优于各种最先进的基线模型，并且通过插值PairModel和BPTM获得了最好的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

自引率

0.00%

发文量