Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval最新文献_第2页

Exploiting Entity Linking in Queries for Entity Retrieval 利用实体链接查询实体检索

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval Pub Date : 2016-09-12 DOI: 10.1145/2970398.2970406

Faegheh Hasibi, K. Balog, Svein Erik Bratsberg

引用次数: 82

A Reproducibility Study of Information Retrieval Models 信息检索模型的再现性研究

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval Pub Date : 2016-09-12 DOI: 10.1145/2970398.2970415

Peilin Yang, Hui Fang

{"title":"A Reproducibility Study of Information Retrieval Models","authors":"Peilin Yang, Hui Fang","doi":"10.1145/2970398.2970415","DOIUrl":"https://doi.org/10.1145/2970398.2970415","url":null,"abstract":"Developing effective information retrieval models has been a long standing challenge in Information Retrieval (IR), and significant progresses have been made over the years. With the increasing number of developed retrieval functions and the release of new data collections, it becomes more difficult, if not impossible, to compare a new retrieval function with all existing retrieval functions over all available data collections. To tackle thisproblem, this paper describes our efforts on constructing a platform that aims to improve the reproducibility of IR researchand facilitate the evaluation and comparison of retrieval functions. With the developed platform, more than 20 state of the art retrieval functions have been implemented and systematically evaluated over 16 standard TREC collections (including the newly released ClueWeb datasets). Our reproducibility study leads to several interesting observations. First, the performance difference between the reproduced results and those reported in the original papers is small for most retrieval functions. Second, the optimal performance of a few representative retrieval functions is still comparable over the new TREC ClueWeb collections. Finally, the developed platform (i.e., RISE) is made publicly available so that any IR researchers would be able to utilize it to evaluate other retrieval functions.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126569781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

A Study of Document Expansion using Translation Models and Dimensionality Reduction Methods 基于翻译模型和降维方法的文献扩展研究

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval Pub Date : 2016-09-12 DOI: 10.1145/2970398.2970439

Saeid Balaneshinkordan, Alexander Kotov

{"title":"A Study of Document Expansion using Translation Models and Dimensionality Reduction Methods","authors":"Saeid Balaneshinkordan, Alexander Kotov","doi":"10.1145/2970398.2970439","DOIUrl":"https://doi.org/10.1145/2970398.2970439","url":null,"abstract":"Over a decade of research on document expansion methods resulted in several independent avenues, including smoothing methods, translation models, and dimensionality reduction techniques, such as matrix decompositions and topic models. Although these research avenues have been individually explored in many previous studies, there is still a lack of understanding of how state-of-the-art methods for each of these directions compare with each other in terms of retrieval accuracy. This paper eliminates this gap by reporting the results of an empirical comparison of document expansion methods using translation models estimated based on word co-occurrence and cosine similarity between low-dimensional word embeddings, Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF), on standard TREC collections. Experimental results indicate that LDA-based document expansion consistently outperforms both types of translation models and NMF according to all evaluation metrics for all and difficult queries, which is closely followed by translation model using word embeddings.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116858507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Nearest Neighbour based Transformation Functions for Text Classification: A Case Study with StackOverflow 基于最近邻的文本分类转换函数:基于StackOverflow的案例研究

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval Pub Date : 2016-09-12 DOI: 10.1145/2970398.2970426

Piyush Arora, Debasis Ganguly, G. Jones

{"title":"Nearest Neighbour based Transformation Functions for Text Classification: A Case Study with StackOverflow","authors":"Piyush Arora, Debasis Ganguly, G. Jones","doi":"10.1145/2970398.2970426","DOIUrl":"https://doi.org/10.1145/2970398.2970426","url":null,"abstract":"significant increase in the number of questions in question answering forums has led to the interest in text categorization methods for classifying a newly posted question as good (suitable) or bad (otherwise) for the forum. Standard text categorization approaches, e.g. multinomial Naive Bayes, are likely to be unsuitable for this classification task because of: i) the lack of sufficient informative content in the questions due to their relatively short length; and ii) considerable vocabulary overlap between the classes. To increase the robustness of this classification task, we propose to use the neighbourhood of existing questions which are similar to the newly asked question. Instead of learning the classification boundary from the questions alone, we transform each question vector into a different one in the feature space. We explore two different neighbourhood functions using: the discrete term space, the continuous vector space of real numbers obtained from vector embeddings of documents. Experiments conducted on StackOverflow data show that our approach of using the neighborhood transformation can improve classification accuracy by up to about 8%.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128383893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A Simple and Effective Approach to Score Standardisation 一种简单有效的分数标准化方法

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval Pub Date : 2016-09-12 DOI: 10.1145/2970398.2970399

T. Sakai

{"title":"A Simple and Effective Approach to Score Standardisation","authors":"T. Sakai","doi":"10.1145/2970398.2970399","DOIUrl":"https://doi.org/10.1145/2970398.2970399","url":null,"abstract":"Webber, Moffat and Zobel proposed score standardization for information retrieval evaluation with multiple test collections. Given a topic-by-run raw score matrix in terms of some evaluation measure, each score can be standardised using the topic's sample mean and sample standard deviation across a set of past runs so as to quantify how different a system is from the \"average\" system in standard deviation units. Using standardised scores, researchers can compare systems across different test collections without worrying about topic hardness or normalisation. WhileWebber et al. mapped the standardised scores to the [0, 1] range using a standard normal cumulative density function, the present study demonstrates that linear transformation of the standardised scores, a method widely used in educational research, can be a simple and effective alternative. We use three TREC robust track data sets with graded relevance assessments and official runs to compare these methods by means of leave-one-out tests, discriminative power, swap rate tests, and topic set size design. In particular, we demonstrate that our method is superior to the method of Webber et al. in terms of swap rates and topic set size design: put simply, our method ensures pairwise system comparisons that are more consistent across different data sets, and is arguably more convenient for designing a new test collection from a statistical viewpoint.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"118 4-5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114048263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Who Wants to Join Me?: Companion Recommendation in Location Based Social Networks 谁想加入我?:基于位置的社交网络中的同伴推荐

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval Pub Date : 2016-09-12 DOI: 10.1145/2970398.2970420

Yi Liao, Wai Lam, Shoaib Jameel, S. Schockaert, Xing Xie

引用次数: 11

Understanding the Message of Images with Knowledge Base Traversals 用知识库遍历来理解图像的信息

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval Pub Date : 2016-09-12 DOI: 10.1145/2970398.2970414

Lydia Weiland, Ioana Hulpus, Simone Paolo Ponzetto, Laura Dietz

{"title":"Understanding the Message of Images with Knowledge Base Traversals","authors":"Lydia Weiland, Ioana Hulpus, Simone Paolo Ponzetto, Laura Dietz","doi":"10.1145/2970398.2970414","DOIUrl":"https://doi.org/10.1145/2970398.2970414","url":null,"abstract":"The message of news articles is often supported by the pointed use of iconic images. These images together with their captions encourage emotional involvement of the reader. Current algorithms for understanding the semantics of news articles focus on its text, often ignoring the image. On the other side, works that target the semantics of images, mostly focus on recognizing and enumerating the objects that appear in the image. In this work, we explore the problem from another perspective: Can we devise algorithms to understand the message encoded by images and their captions? To answer this question, we study how well algorithms can describe an image-caption pair in terms of Wikipedia entities, thereby casting the problem as an entity-ranking task with an image-caption pair as query. Our proposed algorithm brings together aspects of entity linking, subgraph selection, entity clustering, relatedness measures, and learning-to-rank. In our experiments, we focus on media-iconic image-caption pairs which often reflect complex subjects such as sustainable energy and endangered species. Our test collection includes a gold standard of over 300 image-caption pairs about topics at different levels of abstraction. We show that with a MAP of 0.69, the best results are obtained when aggregating content-based and graph-based features in a Wikipedia-derived knowledge base.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114130719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A Topical Approach to Retrievability Bias Estimation 可恢复性偏倚估计的局部方法

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval Pub Date : 2016-09-12 DOI: 10.1145/2970398.2970437

C. Wilkie, L. Azzopardi

{"title":"A Topical Approach to Retrievability Bias Estimation","authors":"C. Wilkie, L. Azzopardi","doi":"10.1145/2970398.2970437","DOIUrl":"https://doi.org/10.1145/2970398.2970437","url":null,"abstract":"Retrievability is an independent evaluation measure that offers insights to an aspect of retrieval systems that performance and efficiency measures do not. Retrievability is often used to calculate the retrievability bias, an indication of how accessible a system makes all the documents in a collection. Generally, computing the retrievability bias of a system requires a colossal number of queries to be issued for the system to gain an accurate estimate of the bias. However, it is often the case that the accuracy of the estimate is not of importance, but the relationship between the estimate of bias and performance when tuning a systems parameters. As such, reaching a stable estimation of bias for the system is more important than getting very accurate retrievability scores for individual documents. This work explores the idea of using topical subsets of the collection for query generation and bias estimation to form a local estimate of bias which correlates with the global estimate of retrievability bias. By using topical subsets, it would be possible to reduce the volume of queries required to reach an accurate estimate of retrievability bias, reducing the time and resources required to perform a retrievability analysis. Findings suggest that this is a viable approach to estimating retrievability bias and that the number of queries required can be reduced to less than a quarter of what was previously thought necessary.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114339287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Advances in Formal Models of Search and Search Behaviour 搜索和搜索行为的形式模型研究进展

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval Pub Date : 2016-09-12 DOI: 10.1145/2970398.2970440

L. Azzopardi, G. Zuccon

{"title":"Advances in Formal Models of Search and Search Behaviour","authors":"L. Azzopardi, G. Zuccon","doi":"10.1145/2970398.2970440","DOIUrl":"https://doi.org/10.1145/2970398.2970440","url":null,"abstract":"Searching is performed in the context of a task and as such the value of the information found is with respect to the task. Recently, there has been a drive to developing formal models of information seeking and retrieval that consider the costs and benefits arising through the interaction with the interface/system and the information surfaced during that interaction. In this full day tutorial we will focus on describing and explaining some of the more recent and latest formal models of Information Seeking and Retrieval. The tutorial is structured into two parts. In the first part we will present a series of models that have been developed based on: (i) economic theory, (ii) decision theory (iii) game theory and (iv) optimal foraging theory. The second part of the day will be dedicated to building models where we will discuss different techniques to build and develop models from which we can draw testable hypotheses from. During the tutorial participants will be challenged to develop various formals models, applying the techniques learnt during the day. We will then conclude with presentations on solutions followed by a summary and overview of challenges and future directions. This tutorial is aimed at participants wanting to know more about the various formal models of information seeking, search and retrieval, that have been proposed. The tutorial will be presented at an intermediate level, and is designed to support participants who want to be able to understand and build such models.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126552190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

The Effect of Document Order and Topic Difficulty on Assessor Agreement 文件顺序和主题难度对评价者协议的影响

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval Pub Date : 2016-09-12 DOI: 10.1145/2970398.2970431

T. T. Damessie, Falk Scholer, K. Järvelin, J. Culpepper

引用次数: 9