Rama Kumar Pasumarthi, Sebastian Bruch, Michael Bendersky, Xuanhui Wang
{"title":"Neural Learning to Rank using TensorFlow Ranking: A Hands-on Tutorial","authors":"Rama Kumar Pasumarthi, Sebastian Bruch, Michael Bendersky, Xuanhui Wang","doi":"10.1145/3341981.3350530","DOIUrl":"https://doi.org/10.1145/3341981.3350530","url":null,"abstract":"A number of open source packages harnessing the power of deep learning have emerged in recent years and are under active development, including TensorFlow, PyTorch and others. Supervised learning is one of the main use cases of deep learning packages. However, compared with the comprehensive support for classification or regression in open-source deep learning packages, there is a paucity of support for ranking problems. To address this gap, we developed TensorFlow Ranking: an open-source library for training large scale learning-to-rank models using deep learning in TensorFlow. The library is flexible and highly configurable: it provides an easy-to-use API to support different scoring mechanisms, loss functions, example weights, and evaluation metrics. In this tutorial, we will combine the theoretical and the practical aspects of TF-Ranking, and will cover how TF-Ranking can be effectively employed in a variety of learning-to-rank scenarios, and demonstrate how it can handle advanced losses, scoring functions and sparse textual features. Finally, we will provide a hands-on codelab using a learning-to-rank dataset which shows how to effective incorporate sparse features for ranking.","PeriodicalId":173154,"journal":{"name":"Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134042927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Personal Knowledge Graphs: A Research Agenda","authors":"K. Balog, Tom Kenter","doi":"10.1145/3341981.3344241","DOIUrl":"https://doi.org/10.1145/3341981.3344241","url":null,"abstract":"Knowledge graphs, organizing structured information about entities, and their attributes and relationships, are ubiquitous today. Entities, in this context, are usually taken to be anyone or anything considered to be globally important. This, however, rules out many entities people interact with on a daily basis. In this position paper, we present the concept of personal knowledge graphs: resources of structured information about entities personally related to its user, including the ones that might not be globally important. We discuss key aspects that separate them for general knowledge graphs, identify the main challenges involved in constructing and using them, and define a research agenda.","PeriodicalId":173154,"journal":{"name":"Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128622948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SADHAN: Hierarchical Attention Networks to Learn Latent Aspect Embeddings for Fake News Detection","authors":"Rahul Mishra, Vinay Setty","doi":"10.1145/3341981.3344229","DOIUrl":"https://doi.org/10.1145/3341981.3344229","url":null,"abstract":"Recently false claims and misinformation have become rampant in the web, affecting election outcomes, societies and economies. Consequently, fact checking websites such as snopes.com and politifact.com are becoming popular. However, these websites require expert analysis which is slow and not scalable. Many recent works try to solve these challenges using machine learning models trained on a variety of features and a rich lexicon or more recently, deep neural networks to avoid feature engineering. In this paper, we propose hierarchical deep attention networks to learn embeddings for various latent aspects of news. Contrary to existing solutions which only apply word-level self-attention, our model jointly learns the latent aspect embeddings for classifying false claims by applying hierarchical attention. Using several manually annotated high quality datasets such as Politifact, Snopes and Fever we show that these learned aspect embeddings are strong predictors of false claims. We show that latent aspect embeddings learned from attention mechanisms improve the accuracy of false claim detection by up to 13.5% in terms of Macro F1 compared to a state-of-the-art attention mechanism guided by claim-text DeClarE. We also extract and visualize the evidence from the external articles which supports or disproves the claims","PeriodicalId":173154,"journal":{"name":"Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128470522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiu-Chang Lin, Pradipto Das, A. Trotman, S. Kallumadi
{"title":"A Dataset and Baselines for e-Commerce Product Categorization","authors":"Yiu-Chang Lin, Pradipto Das, A. Trotman, S. Kallumadi","doi":"10.1145/3341981.3344237","DOIUrl":"https://doi.org/10.1145/3341981.3344237","url":null,"abstract":"We make available a document collection of a million product titles from 3,008 anonymized categories of the rakuten.com product catalog. The anonymization has been done due to intellectual property rights on the underlying data organization taxonomy. Our analysis of the characteristics of the 800,000 training and 20,000 validation titles show that they match the test set of 180,000 titles. Twenty six independent teams participated in an automatic product categorization challenge on this dataset. We present results and analysis and suggest strong baselines for this collection and task.","PeriodicalId":173154,"journal":{"name":"Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121859258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SearchIE","authors":"Sheikh Muhammad Sarwar, J. Allan","doi":"10.1145/3341981.3344248","DOIUrl":"https://doi.org/10.1145/3341981.3344248","url":null,"abstract":"We address the problem of entity extraction with a very few examples and address it with an information retrieval approach. Existing extraction approaches consider millions of features extracted from a large number of training data cases. Typically, these data cases are generated by a distant supervision approach with entities in a knowledge base. After that a model is learned and entities are extracted. However, with extremely limited data a ranked list of relevant entities can be helpful to obtain user feedback to get more training data. As Information Retrieval (IR) is a natural choice for ranked list generation, we explore its effectiveness in such a limited data case. To this end, we propose SearchIE, a hybrid of IR and NLP approach that indexes documents represented using handcrafted NLP features. At query time SearchIE samples terms from a Logistic Regression model trained with extremely limited data. We explore SearchIE's potential by showing that it supersedes state-of-the-art NLP models to find civilians killed by US police officers with only a single civilian name as example.","PeriodicalId":173154,"journal":{"name":"Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"475 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123055684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance Prediction for Non-Factoid Question Answering","authors":"Helia Hashemi, Hamed Zamani, W. Bruce Croft","doi":"10.1145/3341981.3344249","DOIUrl":"https://doi.org/10.1145/3341981.3344249","url":null,"abstract":"Estimating the quality of a result list, often referred to as query performance prediction (QPP), is a challenging and important task in information retrieval. It can be used as feedback to users, search engines, and system administrators. Although predicting the performance of retrieval models has been extensively studied for the ad-hoc retrieval task, the effectiveness of performance prediction methods for question answering (QA) systems is relatively unstudied. The short length of answers, the dominance of neural models in QA, and the re-ranking nature of most QA systems make performance prediction for QA a unique, important, and technically interesting task. In this paper, we introduce and motivate the task of performance prediction for non-factoid question answering and propose a neural performance predictor for this task. Our experiments on two recent datasets demonstrate that the proposed model outperforms competitive baselines in all settings.","PeriodicalId":173154,"journal":{"name":"Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133083298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Why does this Entity matter?: Support Passage Retrieval for Entity Retrieval","authors":"Shubham Chatterjee, Laura Dietz","doi":"10.1145/3341981.3344243","DOIUrl":"https://doi.org/10.1145/3341981.3344243","url":null,"abstract":"Our goal is to complement an entity ranking with human-readable explanations of how those retrieved entities are connected to the information need. While related to the problem of support passage retrieval, in this paper, we explore two underutilized indicators of relevance: contextual entities and entity salience. The effectiveness of the indicators are studied within a supervised learning-to-rank framework on a dataset from TREC Complex Answer Retrieval. We find that salience is a useful indicator, but it is often not applicable. In contrast, although performance improvements are obtained by using contextual entities, using contextual words still outperforms contextual entities.","PeriodicalId":173154,"journal":{"name":"Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130788618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tutorial on Explainable Recommendation and Search","authors":"Yongfeng Zhang","doi":"10.1145/3341981.3353768","DOIUrl":"https://doi.org/10.1145/3341981.3353768","url":null,"abstract":"Explainable recommendation and search attempt to develop models or methods that not only generate high-quality recommendation or search results, but also intuitive explanations of the results for users or system designers, which can help to improve the system transparency, persuasiveness, trustworthiness, and effectiveness, etc. This is even more important in personalized search and recommendation scenarios, where users would like to know why a particular product, web page, news report, or friend suggestion exists in his or her own search and recommendation lists. The tutorial focuses on the research and application of explainable recommendation and search algorithms, as well as their application in real-world systems such as search engine, e-commerce and social networks. The tutorial aims at introducing and communicating explainable recommendation and search methods to the community, as well as gathering researchers and practitioners interested in this research direction for discussions, idea communications, and research promotions.","PeriodicalId":173154,"journal":{"name":"Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128949739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sebastian Bruch, Xuanhui Wang, Michael Bendersky, Marc Najork
{"title":"An Analysis of the Softmax Cross Entropy Loss for Learning-to-Rank with Binary Relevance","authors":"Sebastian Bruch, Xuanhui Wang, Michael Bendersky, Marc Najork","doi":"10.1145/3341981.3344221","DOIUrl":"https://doi.org/10.1145/3341981.3344221","url":null,"abstract":"One of the challenges of learning-to-rank for information retrieval is that ranking metrics are not smooth and as such cannot be optimized directly with gradient descent optimization methods. This gap has given rise to a large body of research that reformulates the problem to fit into existing machine learning frameworks or defines a surrogate, ranking-appropriate loss function. One such loss is ListNet's which measures the cross entropy between a distribution over documents obtained from scores and another from ground-truth labels. This loss was designed to capture permutation probabilities and as such is considered to be only loosely related to ranking metrics. In this work, however, we show that the above statement is not entirely accurate. In fact, we establish an analytical connection between ListNet's loss and two popular ranking metrics in a learning-to-rank setup with binary relevance labels. In particular, we show that the loss bounds Mean Reciprocal Rank and Normalized Discounted Cumulative Gain. Our analysis sheds light on ListNet's behavior and explains its superior performance on binary labeled data over data with graded relevance.","PeriodicalId":173154,"journal":{"name":"Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122839336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sheikh Muhammad Sarwar, John Foley, Liu Yang, J. Allan
{"title":"Sentence Retrieval for Entity List Extraction with a Seed, Context, and Topic","authors":"Sheikh Muhammad Sarwar, John Foley, Liu Yang, J. Allan","doi":"10.1145/3341981.3344250","DOIUrl":"https://doi.org/10.1145/3341981.3344250","url":null,"abstract":"We present a variation of the corpus-based entity set expansion and entity list completion task. A user-specified query and a sentence containing one seed entity are the input to the task. The output is a list of sentences that contain other instances of the entity class indicated by the input. We construct a semantic query expansion model that leverages topical context around the seed entity and scores sentences. The proposed model finds 46% of the target entity class by retrieving 20 sentences on average. It achieves 16% improvement over BM25 in terms of recall@20.","PeriodicalId":173154,"journal":{"name":"Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121477332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}