Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval最新文献

筛选
英文 中文
Analysis of the Paragraph Vector Model for Information Retrieval 面向信息检索的段落向量模型分析
Qingyao Ai, Liu Yang, Jiafeng Guo, W. Bruce Croft
{"title":"Analysis of the Paragraph Vector Model for Information Retrieval","authors":"Qingyao Ai, Liu Yang, Jiafeng Guo, W. Bruce Croft","doi":"10.1145/2970398.2970409","DOIUrl":"https://doi.org/10.1145/2970398.2970409","url":null,"abstract":"Previous studies have shown that semantically meaningful representations of words and text can be acquired through neural embedding models. In particular, paragraph vector (PV) models have shown impressive performance in some natural language processing tasks by estimating a document (topic) level language model. Integrating the PV models with traditional language model approaches to retrieval, however, produces unstable performance and limited improvements. In this paper, we formally discuss three intrinsic problems of the original PV model that restrict its performance in retrieval tasks. We also describe modifications to the model that make it more suitable for the IR task, and show their impact through experiments and case studies. The three issues we address are (1) the unregulated training process of PV is vulnerable to short document over-fitting that produces length bias in the final retrieval model; (2) the corpus-based negative sampling of PV leads to a weighting scheme for words that overly suppresses the importance of frequent words; and (3) the lack of word-context information makes PV unable to capture word substitution relationships.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123629949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 79
Embedding-based Query Language Models 基于嵌入的查询语言模型
Hamed Zamani, W. Bruce Croft
{"title":"Embedding-based Query Language Models","authors":"Hamed Zamani, W. Bruce Croft","doi":"10.1145/2970398.2970405","DOIUrl":"https://doi.org/10.1145/2970398.2970405","url":null,"abstract":"Word embeddings, which are low-dimensional vector representations of vocabulary terms that capture the semantic similarity between them, have recently been shown to achieve impressive performance in many natural language processing tasks. The use of word embeddings in information retrieval, however, has only begun to be studied. In this paper, we explore the use of word embeddings to enhance the accuracy of query language models in the ad-hoc retrieval task. To this end, we propose to use word embeddings to incorporate and weight terms that do not occur in the query, but are semantically related to the query terms. We describe two embedding-based query expansion models with different assumptions. Since pseudo-relevance feedback methods that use the top retrieved documents to update the original query model are well-known to be effective, we also develop an embedding-based relevance model, an extension of the effective and robust relevance model approach. In these models, we transform the similarity values obtained by the widely-used cosine similarity with a sigmoid function to have more discriminative semantic similarity values. We evaluate our proposed methods using three TREC newswire and web collections. The experimental results demonstrate that the embedding-based methods significantly outperform competitive baselines in most cases. The embedding-based methods are also shown to be more robust than the baselines.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115770565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 130
Bag-of-Entities Representation for Ranking 用于排序的实体袋表示
Chenyan Xiong, Jamie Callan, Tie-Yan Liu
{"title":"Bag-of-Entities Representation for Ranking","authors":"Chenyan Xiong, Jamie Callan, Tie-Yan Liu","doi":"10.1145/2970398.2970423","DOIUrl":"https://doi.org/10.1145/2970398.2970423","url":null,"abstract":"This paper presents a new bag-of-entities representation for document ranking, with the help of modern knowledge bases and automatic entity linking. Our system represents query and documents by bag-of-entities vectors constructed from their entity annotations, and ranks documents by their matches with the query in the entity space. Our experiments with Freebase on TREC Web Track datasets demonstrate that current entity linking systems can provide sufficient coverage of the general domain search task, and that bag-of-entities representations outperform bag-of-words by as much as 18% in standard document ranking tasks.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"94 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130721504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Estimating Retrieval Performance Bound for Single Term Queries 估计单词查询的检索性能边界
Peilin Yang, Hui Fang
{"title":"Estimating Retrieval Performance Bound for Single Term Queries","authors":"Peilin Yang, Hui Fang","doi":"10.1145/2970398.2970428","DOIUrl":"https://doi.org/10.1145/2970398.2970428","url":null,"abstract":"Various information retrieval models have been studied for decades. Most traditional retrieval models are based on bag-of-termrepresentations, and they model the relevance based on various collection statistics. Despite these efforts, it seems that the performance of \"bag-of-term\" based retrieval functions has reached plateau, and it becomes increasingly difficult to further improve the retrieval performance. Thus, one important research question is whether we can provide any theoretical justifications on the empirical performance bound of basic retrieval functions. In this paper, we start with single term queries, and aim to estimate the performance bound of retrieval functions that leverage only basic ranking signals such as document term frequency, inverse document frequency and document length normalization. Specifically, we demonstrate that, when only single-term queries are considered, there is a general function that can cover many basic retrieval functions. We then propose to estimate the upper bound performance of this function by applying a cost/gain analysis to search for the optimal value of the function.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129757109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Estimating Embedding Vectors for Queries 估计查询的嵌入向量
Hamed Zamani, W. Bruce Croft
{"title":"Estimating Embedding Vectors for Queries","authors":"Hamed Zamani, W. Bruce Croft","doi":"10.1145/2970398.2970403","DOIUrl":"https://doi.org/10.1145/2970398.2970403","url":null,"abstract":"The dense vector representation of vocabulary terms, also known as word embeddings, have been shown to be highly effective in many natural language processing tasks. Word embeddings have recently begun to be studied in a number of information retrieval (IR) tasks. One of the main steps in leveraging word embeddings for IR tasks is to estimate the embedding vectors of queries. This is a challenging task, since queries are not always available during the training phase of word embedding vectors. Previous work has considered the average or sum of embedding vectors of all query terms (AWE) to model the query embedding vectors, but no theoretical justification has been presented for such a model. In this paper, we propose a theoretical framework for estimating query embedding vectors based on the individual embedding vectors of vocabulary terms. We then provide a number of different implementations of this framework and show that the AWE method is a special case of the proposed framework. We also introduce pseudo query vectors, the query embedding vectors estimated using pseudo-relevant documents. We further extrinsically evaluate the proposed methods using two well-known IR tasks: query expansion and query classification. The estimated query embedding vectors are evaluated via query expansion experiments over three newswire and web TREC collections as well as query classification experiments over the KDD Cup 2005 test set. The experiments show that the introduced pseudo query vectors significantly outperform the AWE method.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131582009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 104
Fast Feature Selection for Learning to Rank 快速特征选择学习排名
Andrea Gigli, C. Lucchese, F. M. Nardini, R. Perego
{"title":"Fast Feature Selection for Learning to Rank","authors":"Andrea Gigli, C. Lucchese, F. M. Nardini, R. Perego","doi":"10.1145/2970398.2970433","DOIUrl":"https://doi.org/10.1145/2970398.2970433","url":null,"abstract":"An emerging research area named Learning-to-Rank (LtR) has shown that effective solutions to the ranking problem can leverage machine learning techniques applied to a large set of features capturing the relevance of a candidate document for the user query. Large-scale search systems must however answer user queries very fast, and the computation of the features for candidate documents must comply with strict back-end latency constraints. The number of features cannot thus grow beyond a given limit, and Feature Selection (FS) techniques have to be exploited to find a subset of features that both meets latency requirements and leads to high effectiveness of the trained models. In this paper, we propose three new algorithms for FS specifically designed for the LtR context where hundreds of continuous or categorical features can be involved. We present a comprehensive experimental analysis conducted on publicly available LtR datasets and we show that the proposed strategies outperform a well-known state-of-the-art competitor.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"89 36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129793672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Topic Set Size Design and Power Analysis in Practice 主题集大小设计与功效分析的实践
T. Sakai
{"title":"Topic Set Size Design and Power Analysis in Practice","authors":"T. Sakai","doi":"10.1145/2970398.2970443","DOIUrl":"https://doi.org/10.1145/2970398.2970443","url":null,"abstract":"Topic set size design methods provide principles and procedures for test collection builders to decide on the number of topics to create. These methods can then help us keep improving the test collection design based on accumulated data. Simple Excel tools are available for such purposes. Post-hoc power analysis tools, available as simple R scripts, can help IR researchers examine the achieved power of a reported experiment and determine future sample sizes for ensuring high power. Thus, for example, underpowered user experiments can be detected, and a larger sample size can be proposed. If used appropriately, these Excel and R tools should be able to provide the IR community with better experimentation practices. The main objective of this tutorial is to let IR researchers familiarise themselves with these tools and understand the basic ideas behind them.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115169593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Exploring Urban Lifestyles Using a Nonparametric Temporal Graphical Model 使用非参数时间图形模型探索城市生活方式
Shoaib Jameel, Yi Liao, Wai Lam, S. Schockaert, Xing Xie
{"title":"Exploring Urban Lifestyles Using a Nonparametric Temporal Graphical Model","authors":"Shoaib Jameel, Yi Liao, Wai Lam, S. Schockaert, Xing Xie","doi":"10.1145/2970398.2970401","DOIUrl":"https://doi.org/10.1145/2970398.2970401","url":null,"abstract":"We propose a new unsupervised nonparametric temporal topic model to discover lifestyle patterns from location-based social networks. By relating the textual content, time stamps, and venue categories associated to user check-ins, our framework detects the predominant lifestyle patterns in a given geographic region. The temporal component of our model allows us to analyse the evolution of lifestyle patterns throughout the year. We provide examples of interesting patterns that have been discovered by our model, and we show that our model compares favourably to existing approaches in terms of lifestyle pattern quality and computation time. We also quantitatively show that our model outperforms existing methods in a time stamp prediction task.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125743856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Collaborative Information Retrieval: Frameworks, Theoretical Models, and Emerging Topics 协同信息检索:框架、理论模型和新兴主题
L. Tamine, L. Soulier
{"title":"Collaborative Information Retrieval: Frameworks, Theoretical Models, and Emerging Topics","authors":"L. Tamine, L. Soulier","doi":"10.1145/2970398.2970442","DOIUrl":"https://doi.org/10.1145/2970398.2970442","url":null,"abstract":"A great amount of research in the IR domain mostly dealt with both the design of enhanced document ranking models allowing search improvement through user-to-system collaboration. However, in addition to user-to-system form of collaboration, user-to-user collaboration is increasingly acknowledged as an effective mean for gathering the complementary skills and/or knowledge of individual users in order to solve complex search tasks. This tutorial will first give an overview of the ways into collaboration has been implemented in IR models with the attempt of improving the search outcomes with respect to several tasks and related frameworks (ad-hoc search, group-based recommendation, social search, collaborative search). Second, as envisioned in collaborative IR domain (CIR), we will focus on the theoretical models that support and drive user-to-user collaboration in order to perform shared IR tasks. Third, we will develop a road map on emerging and relevant topics addressing issues related to collaboration design. Our goal is to provide participants with concepts and motivation allowing them to investigate this emerging IR domain as well as giving them some clues on how to tackle issues related to the optimization of collaborative tasks. More specifically, the tutorial aims to: (a) Give an overview of the key concept of collaboration in IR and related research topics; (b) Present state-of-the art CIR techniques and models; (c) Discuss about the emerging topics that deal with collaboration; (d) Point out some challenges ahead.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125687537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Temporal Query Expansion Using a Continuous Hidden Markov Model 使用连续隐马尔可夫模型的时态查询扩展
J. Rao, Jimmy J. Lin
{"title":"Temporal Query Expansion Using a Continuous Hidden Markov Model","authors":"J. Rao, Jimmy J. Lin","doi":"10.1145/2970398.2970424","DOIUrl":"https://doi.org/10.1145/2970398.2970424","url":null,"abstract":"In standard formulations of pseudo-relevance feedback, document timestamps do not play a role in identifying expansion terms. Yet we know that when searching social media posts such as tweets, relevant documents are bursty and usually occur in temporal clusters. The main insight of our work is that term expansions should be biased to draw from documents that occur in bursty temporal clusters. This is formally captured by a continuous hidden Markov model (cHMM), for which we derive an EM algorithm for parameter estimation. Given a query, we estimate the parameters for a cHMM that best explains the observed distribution of an initial set of retrieved documents, and then use Viterbi decoding to compute the most likely state sequence. In identifying expansion terms, we only select documents from bursty states. Experiments on test collections from the TREC 2011 and 2012 Microblog tracks show that our approach is significantly more effective than the popular RM3 pseudo-relevance feedback model.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129849111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信