{"title":"Exploiting Stopping Time to Evaluate Accumulated Relevance","authors":"M. Ferrante, N. Ferro","doi":"10.1145/3409256.3409832","DOIUrl":"https://doi.org/10.1145/3409256.3409832","url":null,"abstract":"Evaluation measures are more or less explicitly based on user models which abstract how users interact with a ranked result list and how they accumulate utility from it. However, traditional measures typically come with a hard-coded user model which can be, at best, parametrized. Moreover, they take a deterministic approach which leads to assign a precise score to a system run. In this paper, we take a different angle and, by relying on Markov chains and random walks, we propose a new family of evaluation measures which are able to accommodate for different and flexible user models, allow for simulating the interaction of different users, and turn the score into a random variable which more richly describes the performance of a system. We also show how the proposed framework allows for instantiating and better explaining some state-of-the-art measures, like AP, RBP, DCG, and ERR.","PeriodicalId":430907,"journal":{"name":"Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129021853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhengkai Tu, Wei Yang, Zihang Fu, Yuqing Xie, Luchen Tan, Kun Xiong, Ming Li, Jimmy J. Lin
{"title":"Approximate Nearest Neighbor Search and Lightweight Dense Vector Reranking in Multi-Stage Retrieval Architectures","authors":"Zhengkai Tu, Wei Yang, Zihang Fu, Yuqing Xie, Luchen Tan, Kun Xiong, Ming Li, Jimmy J. Lin","doi":"10.1145/3409256.3409818","DOIUrl":"https://doi.org/10.1145/3409256.3409818","url":null,"abstract":"In the context of a multi-stage retrieval architecture, we explore candidate generation based on approximate nearest neighbor (ANN) search and lightweight reranking based on dense vector representations. These results serve as input to slower but more accurate rerankers such as those based on transformers. Our goal is to characterize the effectiveness-efficiency tradeoff space in this context. We find that, on sentence-length segments of text, ANN techniques coupled with dense vector reranking dominate approaches based on inverted indexes, and thus our proposed design should be preferred. For paragraph-length segments, ANN-based and index-based techniques share the Pareto frontier, which means that the choice of alternatives depends on the desired operating point.","PeriodicalId":430907,"journal":{"name":"Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129195516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Personalising and Diversifying the Listening Experience","authors":"M. Lalmas","doi":"10.1145/3409256.3410464","DOIUrl":"https://doi.org/10.1145/3409256.3410464","url":null,"abstract":"of June 2020, over 250 million monthly active users across 92 markets worldwide listening to over 60 million tracks and 1.5M podcast titles. We help this audio find the right audience via our recommendation products, which include playlist recommendation, playlist sequencing, and podcast show and episode recommendation. A large percentage of audio consumption is from Home, which make it valuable spaces for surfacing personalised and diverse content. This talk will present some of the research we completed on how to personalize the listening experience, and what diversity means in the context of a personalised listening experience.","PeriodicalId":430907,"journal":{"name":"Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133485724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling Search Engine Performance Measurement","authors":"M. Sanderson","doi":"10.1145/3409256.3410466","DOIUrl":"https://doi.org/10.1145/3409256.3410466","url":null,"abstract":"The information retrieval (IR) community is rightly proud of its passion for evaluation. This conference has been a welcome refuge when passion becomes obsession. ICTIR's transformation from a largely mathematically based theoretical forum to one that seeks generalizable observations from all areas perfectly suits the needs of IR. However, how much have researchers sought to generalize or model search from evaluation? I will present a set of research papers by others as well as my collaborators and I that since the early 1990s have reported generalizing observations from large scale tests. It's only relatively recently that I've come to realise that these results have been missed by many in the community, yet the models produced carry a great deal of valuable generalizing information about our retrieval systems.","PeriodicalId":430907,"journal":{"name":"Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126158385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haggai Roitman, Y. Mass, Guy Feigenblat, Roee Shraga
{"title":"Query Performance Prediction for Multifield Document Retrieval","authors":"Haggai Roitman, Y. Mass, Guy Feigenblat, Roee Shraga","doi":"10.1145/3409256.3409821","DOIUrl":"https://doi.org/10.1145/3409256.3409821","url":null,"abstract":"The goal of the query performance prediction (QPP) task is to estimate retrieval effectiveness in the absence of relevance judgments. We consider a novel task of predicting the performance of multifield document retrieval. In this setting, documents are assumed to consist of several different textual descriptions (fields) on which the query is being evaluated. Overall, we study three predictor types. The first type applies a given basic QPP method directly on the retrieval's outcome. Building on the idea of reference-lists, the second type utilizes several pseudo-effective (PE) reference-lists. Each such list is retrieved by further evaluating the query over a specific (single) document field. The third predictor is built on the assumption that, a high agreement among the single-field PE reference-lists attests to a more effective retrieval. Using three different multifield document retrieval tasks we demonstrate the merits of our extended QPP methods. Specifically, we show the important role that the intrinsic agreement among the single-field PE reference-lists plays in this extended QPP task.","PeriodicalId":430907,"journal":{"name":"Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131231620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Multistage Ranking Strategy for Personalized Hotel Recommendation with Human Mobility Data","authors":"Yiwei Li, M. Fan, Jizhou Huang, Kan Li","doi":"10.1145/3409256.3409810","DOIUrl":"https://doi.org/10.1145/3409256.3409810","url":null,"abstract":"To increase user satisfaction and own income, more and more hotel booking sites begin to pay attention to personalized recommendation. However, almost all user preference information only comes from the user actions in the hotel reservation scenario. Obviously, this approach has its limitations in particular in situation of user cold start, i.e., when only little information is available about an individual user. In this paper, we focus on the hotel recommendation in mobile map applications, which has abundant human mobility data to provide extra personalized information for hotel search ranking. For this purpose, we propose a personalized multistage pairwise learning-to-ranking model, which can capture more personalized information by utilizing full scenarios hotel click data of users in map applications. At the same time, the multistage model can effectively solve the problem of cold start. Both offline and online evaluation results show that the proposed model significantly outperforms multiple strong baseline methods.","PeriodicalId":430907,"journal":{"name":"Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125593550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Memorable Information Retrieval","authors":"S. Qiu, U. Gadiraju, A. Bozzon","doi":"10.1145/3409256.3409830","DOIUrl":"https://doi.org/10.1145/3409256.3409830","url":null,"abstract":"Information overload is a problem many of us can relate to nowadays. The deluge of user generated content on the Internet, and the easy accessibility to a vast amount of data compounds the problem of remembering and retaining information that is consumed. To make information consumed more memorable, strategies such as note-taking have been found to be effective by augmenting human memory under specific conditions. This is based on the rationale that humans tend to recall information better if they have produced the information themselves. Previous works in online education have shown that conversational systems can improve learning effects. Although memorization is an important part of learning, the effect of conversation on human memorability remains unexplored. We aim to address this knowledge gap through an experimental study, by investigating human memorability in a classical information retrieval setup. We explore the impact of note-taking affordances and conversational interfaces on the memorability of information consumed by users. Our results show that traditional web search and note-taking have positive effects on knowledge gain, while the search engine with a conversational interface has the potential to augment long-term memorability. This work highlights the benefits of using note-taking and conversational interfaces to aid human memorability. Our findings have important implications on building information retrieval systems that cater to optimizing memorability of information consumed.","PeriodicalId":430907,"journal":{"name":"Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126777765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning to Rank Entities for Set Expansion from Unstructured Data","authors":"Puxuan Yu, Razieh Rahimi, Zhiqi Huang, J. Allan","doi":"10.1145/3409256.3409811","DOIUrl":"https://doi.org/10.1145/3409256.3409811","url":null,"abstract":"We propose using learning-to-rank for entity set expansion (ESE) from unstructured data, the task of finding \"sibling\" entities within a corpus that are from the set characterized by a small set of seed entities. We present a two-channel neural re-ranking model, NESE, that jointly learns exact and semantic matching of entity contexts through entity interaction features. Although entity set expansion has drawn increasing attention in the IR and NLP communities for its various applications, the lack of massive annotated entity sets has hindered the development of neural approaches. We describe DBpedia-Sets, a toolkit that automatically extracts entity sets from a plain text collection, thus providing a large amount of distant supervision data for neural model training. Experiments on real datasets of different scales from different domains show that NESE outperforms state-of-the-art approaches in terms of precision and MAP. Furthermore, evaluation through human annotations shows that the knowledge learned from the training data is generalizable.","PeriodicalId":430907,"journal":{"name":"Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125972387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Hybrid Conditional Variational Autoencoder Model for Personalised Top-n Recommendation","authors":"Yaxiong Wu, C. Macdonald, I. Ounis","doi":"10.1145/3409256.3409835","DOIUrl":"https://doi.org/10.1145/3409256.3409835","url":null,"abstract":"The interactions of users with a recommendation system are in general sparse, leading to the well-known cold-start problem. Side information, such as age, occupation, genre and category, have been widely used to learn latent representations for users and items in order to address the sparsity of users' interactions. Conditional Variational Autoencoders (CVAEs) have recently been adapted for integrating side information as conditions to constrain the learned latent factors and to thereby generate personalised recommendations. However, the learning of effective latent representations that encapsulate both user (e.g. demographic information) and item side information (e.g. item categories) is still challenging. In this paper, we propose a new recommendation model, called Hybrid Conditional Variational Autoencoder (HCVAE) model, for personalised top-n recommendation, which effectively integrates both user and item side information to tackle the cold-start problem. Two CVAE-based methods -- using conditions on the learned latent factors, or conditions on the encoders and decoders -- are compared for integrating side information as conditions. Our HCVAE model leverages user and item side information as part of the optimisation objective to help the model construct more expressive latent representations and to better capture attributes of the users and items (such as demographic, category preferences) within the personalised item probability distributions. Thorough and extensive experiments conducted on both the MovieLens and Ta-feng datasets demonstrate that the HCVAE model conditioned on user category preferences with conditions on the learned latent factors can significantly outperform common existing top-n recommendation approaches such as MF-based and VAE/CVAE-based models.","PeriodicalId":430907,"journal":{"name":"Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval","volume":"1048 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122485481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ICTIR Tutorial: Modern Query Performance Prediction: Theory and Practice","authors":"Haggai Roitman","doi":"10.1145/3409256.3409813","DOIUrl":"https://doi.org/10.1145/3409256.3409813","url":null,"abstract":"Query performance prediction (QPP) is a core information retrieval (IR) task whose primary goal is to assess retrieval quality in the absence of relevance judgments. Applications of QPP are numerous, and include, among others, automatic query reformulation, fusion and ranker selection, distributed search and content analysis. The main objective of this tutorial is to introduce recent advances in the sub-research area of QPP in IR, covering both theory and applications. On the theoretical side, we will introduce modern QPP frameworks, which have advanced our understanding of the core QPP task. On the application side, the tutorial will set the connection between QPP theory and its usage in various modern IR applications, discussing the pros and cons, limitations, challenges and open research questions.","PeriodicalId":430907,"journal":{"name":"Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125042207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}