Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval最新文献_第4页

Analysis of the Paragraph Vector Model for Information Retrieval 面向信息检索的段落向量模型分析

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval Pub Date : 2016-09-12 DOI: 10.1145/2970398.2970409

Qingyao Ai, Liu Yang, Jiafeng Guo, W. Bruce Croft

{"title":"Analysis of the Paragraph Vector Model for Information Retrieval","authors":"Qingyao Ai, Liu Yang, Jiafeng Guo, W. Bruce Croft","doi":"10.1145/2970398.2970409","DOIUrl":"https://doi.org/10.1145/2970398.2970409","url":null,"abstract":"Previous studies have shown that semantically meaningful representations of words and text can be acquired through neural embedding models. In particular, paragraph vector (PV) models have shown impressive performance in some natural language processing tasks by estimating a document (topic) level language model. Integrating the PV models with traditional language model approaches to retrieval, however, produces unstable performance and limited improvements. In this paper, we formally discuss three intrinsic problems of the original PV model that restrict its performance in retrieval tasks. We also describe modifications to the model that make it more suitable for the IR task, and show their impact through experiments and case studies. The three issues we address are (1) the unregulated training process of PV is vulnerable to short document over-fitting that produces length bias in the final retrieval model; (2) the corpus-based negative sampling of PV leads to a weighting scheme for words that overly suppresses the importance of frequent words; and (3) the lack of word-context information makes PV unable to capture word substitution relationships.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123629949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 79

Embedding-based Query Language Models 基于嵌入的查询语言模型

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval Pub Date : 2016-09-12 DOI: 10.1145/2970398.2970405

Hamed Zamani, W. Bruce Croft

{"title":"Embedding-based Query Language Models","authors":"Hamed Zamani, W. Bruce Croft","doi":"10.1145/2970398.2970405","DOIUrl":"https://doi.org/10.1145/2970398.2970405","url":null,"abstract":"Word embeddings, which are low-dimensional vector representations of vocabulary terms that capture the semantic similarity between them, have recently been shown to achieve impressive performance in many natural language processing tasks. The use of word embeddings in information retrieval, however, has only begun to be studied. In this paper, we explore the use of word embeddings to enhance the accuracy of query language models in the ad-hoc retrieval task. To this end, we propose to use word embeddings to incorporate and weight terms that do not occur in the query, but are semantically related to the query terms. We describe two embedding-based query expansion models with different assumptions. Since pseudo-relevance feedback methods that use the top retrieved documents to update the original query model are well-known to be effective, we also develop an embedding-based relevance model, an extension of the effective and robust relevance model approach. In these models, we transform the similarity values obtained by the widely-used cosine similarity with a sigmoid function to have more discriminative semantic similarity values. We evaluate our proposed methods using three TREC newswire and web collections. The experimental results demonstrate that the embedding-based methods significantly outperform competitive baselines in most cases. The embedding-based methods are also shown to be more robust than the baselines.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115770565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 130

Bag-of-Entities Representation for Ranking 用于排序的实体袋表示

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval Pub Date : 2016-09-12 DOI: 10.1145/2970398.2970423

Chenyan Xiong, Jamie Callan, Tie-Yan Liu

引用次数: 44

Estimating Retrieval Performance Bound for Single Term Queries 估计单词查询的检索性能边界

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval Pub Date : 2016-09-12 DOI: 10.1145/2970398.2970428

Peilin Yang, Hui Fang

引用次数: 4

Estimating Embedding Vectors for Queries 估计查询的嵌入向量

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval Pub Date : 2016-09-12 DOI: 10.1145/2970398.2970403

Hamed Zamani, W. Bruce Croft

{"title":"Estimating Embedding Vectors for Queries","authors":"Hamed Zamani, W. Bruce Croft","doi":"10.1145/2970398.2970403","DOIUrl":"https://doi.org/10.1145/2970398.2970403","url":null,"abstract":"The dense vector representation of vocabulary terms, also known as word embeddings, have been shown to be highly effective in many natural language processing tasks. Word embeddings have recently begun to be studied in a number of information retrieval (IR) tasks. One of the main steps in leveraging word embeddings for IR tasks is to estimate the embedding vectors of queries. This is a challenging task, since queries are not always available during the training phase of word embedding vectors. Previous work has considered the average or sum of embedding vectors of all query terms (AWE) to model the query embedding vectors, but no theoretical justification has been presented for such a model. In this paper, we propose a theoretical framework for estimating query embedding vectors based on the individual embedding vectors of vocabulary terms. We then provide a number of different implementations of this framework and show that the AWE method is a special case of the proposed framework. We also introduce pseudo query vectors, the query embedding vectors estimated using pseudo-relevant documents. We further extrinsically evaluate the proposed methods using two well-known IR tasks: query expansion and query classification. The estimated query embedding vectors are evaluated via query expansion experiments over three newswire and web TREC collections as well as query classification experiments over the KDD Cup 2005 test set. The experiments show that the introduced pseudo query vectors significantly outperform the AWE method.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131582009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 104

Fast Feature Selection for Learning to Rank 快速特征选择学习排名

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval Pub Date : 2016-09-12 DOI: 10.1145/2970398.2970433

Andrea Gigli, C. Lucchese, F. M. Nardini, R. Perego

引用次数: 18

Topic Set Size Design and Power Analysis in Practice 主题集大小设计与功效分析的实践

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval Pub Date : 2016-09-12 DOI: 10.1145/2970398.2970443

T. Sakai

引用次数: 6

Exploring Urban Lifestyles Using a Nonparametric Temporal Graphical Model 使用非参数时间图形模型探索城市生活方式

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval Pub Date : 2016-09-12 DOI: 10.1145/2970398.2970401

Shoaib Jameel, Yi Liao, Wai Lam, S. Schockaert, Xing Xie

引用次数: 1

Collaborative Information Retrieval: Frameworks, Theoretical Models, and Emerging Topics 协同信息检索:框架、理论模型和新兴主题

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval Pub Date : 2016-09-12 DOI: 10.1145/2970398.2970442

L. Tamine, L. Soulier

{"title":"Collaborative Information Retrieval: Frameworks, Theoretical Models, and Emerging Topics","authors":"L. Tamine, L. Soulier","doi":"10.1145/2970398.2970442","DOIUrl":"https://doi.org/10.1145/2970398.2970442","url":null,"abstract":"A great amount of research in the IR domain mostly dealt with both the design of enhanced document ranking models allowing search improvement through user-to-system collaboration. However, in addition to user-to-system form of collaboration, user-to-user collaboration is increasingly acknowledged as an effective mean for gathering the complementary skills and/or knowledge of individual users in order to solve complex search tasks. This tutorial will first give an overview of the ways into collaboration has been implemented in IR models with the attempt of improving the search outcomes with respect to several tasks and related frameworks (ad-hoc search, group-based recommendation, social search, collaborative search). Second, as envisioned in collaborative IR domain (CIR), we will focus on the theoretical models that support and drive user-to-user collaboration in order to perform shared IR tasks. Third, we will develop a road map on emerging and relevant topics addressing issues related to collaboration design. Our goal is to provide participants with concepts and motivation allowing them to investigate this emerging IR domain as well as giving them some clues on how to tackle issues related to the optimization of collaborative tasks. More specifically, the tutorial aims to: (a) Give an overview of the key concept of collaboration in IR and related research topics; (b) Present state-of-the art CIR techniques and models; (c) Discuss about the emerging topics that deal with collaboration; (d) Point out some challenges ahead.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125687537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Temporal Query Expansion Using a Continuous Hidden Markov Model 使用连续隐马尔可夫模型的时态查询扩展

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval Pub Date : 2016-09-12 DOI: 10.1145/2970398.2970424

J. Rao, Jimmy J. Lin

引用次数: 9