{"title":"Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking","authors":"Harrie Oosterhuis, M. de Rijke","doi":"10.1145/3409256.3409820","DOIUrl":"https://doi.org/10.1145/3409256.3409820","url":null,"abstract":"Counterfactual evaluation can estimate Click-Through-Rate (CTR) differences between ranking systems based on historical interaction data, while mitigating the effect of position bias and item-selection bias. We introduce the novel Logging-Policy Optimization Algorithm (LogOpt), which optimizes the policy for logging data so that the counterfactual estimate has minimal variance. As minimizing variance leads to faster convergence, LogOpt increases the data-efficiency of counterfactual estimation. LogOpt turns the counterfactual approach - which is indifferent to the logging policy - into an online approach, where the algorithm decides what rankings to display. We prove that, as an online evaluation method, LogOpt is unbiased w.r.t. position and item-selection bias, unlike existing interleaving methods. Furthermore, we perform large-scale experiments by simulating comparisons between thousands of rankers. Our results show that while interleaving methods make systematic errors, LogOpt is as efficient as interleaving without being biased.","PeriodicalId":430907,"journal":{"name":"Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115548257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding BERT Rankers Under Distillation","authors":"Luyu Gao, Zhuyun Dai, Jamie Callan","doi":"10.1145/3409256.3409838","DOIUrl":"https://doi.org/10.1145/3409256.3409838","url":null,"abstract":"Deep language models, such as BERT pre-trained on large corpora, have given a huge performance boost to state-of-the-art information retrieval ranking systems. Knowledge embedded in such models allows them to pick up complex matching signals between passages and queries. However, the high computation cost during inference limits their deployment in real-world search scenarios. In this paper, we study if and how the knowledge for search within BERT can be transferred to a smaller ranker through distillation. Our experiments demonstrate that it is crucial to use a proper distillation procedure, which produces up to nine times speedup while preserving the state-of-the-art performance.","PeriodicalId":430907,"journal":{"name":"Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129731627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amal Htait, S. Fournier, P. Bellot, L. Azzopardi, G. Pasi
{"title":"Using Sentiment Analysis for Pseudo-Relevance Feedback in Social Book Search","authors":"Amal Htait, S. Fournier, P. Bellot, L. Azzopardi, G. Pasi","doi":"10.1145/3409256.3409847","DOIUrl":"https://doi.org/10.1145/3409256.3409847","url":null,"abstract":"Book search is a challenging task due to discrepancies between the content and description of books, on one side, and the ways in which people query for books, on the other. However, online reviewers provide an opinionated description of the book, with alternative features that describe the emotional and experiential aspects of the book. Therefore, locating emotional sentences within reviews, could provide a rich alternative source of evidence to help improve book recommendations. Specifically, sentiment analysis (SA) could be employed to identify salient emotional terms, which could then be used for query expansion? This paper explores the employment ofSA based query expansion, in the book search domain. We introduce a sentiment-oriented method for the selection of sentences from the reviews of top rated book. From these sentences, we extract the terms to be employed in the query formulation. The sentence selection process is based on a semi-supervised SA method, which makes use of adapted word embeddings and lexicon seed-words.Using the CLEF 2016 Social Book Search (SBS) Suggestion TrackCollection, an exploratory comparison between standard pseudo-relevance feedback and the proposed sentiment-based approach is performed. The experiments show that the proposed approach obtains 24%-57% improvement over the baselines, whilst the classic technique actually degrades the performance by 14%-51%.","PeriodicalId":430907,"journal":{"name":"Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126542692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md. Mustafizur Rahman, Mucahid Kutlu, T. Elsayed, Matthew Lease
{"title":"Efficient Test Collection Construction via Active Learning","authors":"Md. Mustafizur Rahman, Mucahid Kutlu, T. Elsayed, Matthew Lease","doi":"10.1145/3409256.3409837","DOIUrl":"https://doi.org/10.1145/3409256.3409837","url":null,"abstract":"To create a new IR test collection at low cost, it is valuable to carefully select which documents merit human relevance judgments. Shared task campaigns such as NIST TREC pool document rankings from many participating systems (and often interactive runs as well) in order to identify the most likely relevant documents for human judging. However, if one's primary goal is merely to build a test collection, it would be useful to be able to do so without needing to run an entire shared task. Toward this end, we investigate multiple active learning strategies which, without reliance on system rankings: 1) select which documents human assessors should judge; and 2) automatically classify the relevance of additional unjudged documents. To assess our approach, we report experiments on five TREC collections with varying scarcity of relevant documents. We report labeling accuracy achieved, as well as rank correlation when evaluating participant systems based upon these labels vs. full pool judgments. Results show the effectiveness of our approach, and we further analyze how varying relevance scarcity across collections impacts our findings. To support reproducibility and follow-on work, we have shared our code onlinefootnoteurlhttps://github.com/mdmustafizurrahman/ICTIR_AL_TestCollection_2020/ .","PeriodicalId":430907,"journal":{"name":"Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125169965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}