Hamed Bonab, Mohammad Aliannejadi, John Foley, J. Allan
{"title":"Incorporating Hierarchical Domain Information to Disambiguate Very Short Queries","authors":"Hamed Bonab, Mohammad Aliannejadi, John Foley, J. Allan","doi":"10.1145/3341981.3344251","DOIUrl":"https://doi.org/10.1145/3341981.3344251","url":null,"abstract":"Users often express their information needs using incomplete or ambiguous queries of only one or two terms in length, particularly in the Web environments. The ambiguity of short queries is a recognized problem for information retrieval (IR) systems. In this study, we investigate various approaches for incorporating hierarchical domain information into IR models such that the domain specification resolves the ambiguity. To this end, we develop practical models for constructing evaluation datasets from existing corpora. In terms of effectiveness, we further study the trade-off between a short query and its domain specification information. In doing so, we find that domains with the highest number of relevant documents are not always the best ones to select. We also evaluate the utility of a domain hierarchy and find that incorporating the hierarchical structure of a collection into the retrieval model could have a high impact on short query disambiguation.","PeriodicalId":173154,"journal":{"name":"Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127306942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Learning of Human Information Foraging Behavior with a Search Engine","authors":"Xi Niu, Xiangyu Fan","doi":"10.1145/3341981.3344231","DOIUrl":"https://doi.org/10.1145/3341981.3344231","url":null,"abstract":"In this paper, a two-level deep learning framework is presented to model human information foraging behavior with search engines. A recurrent neural network architecture is designed using LSTM as the base unit to explicitly consider the temporal and spatial dependencies of information scents, the key concept in Information Foraging Theory. The target is to predict several major search behaviors, such as query abandonment, query reformulation, number of clicks, and information gain. The memory capability and the sequence structure of LSTM allow to naturally mimic not only what users are perceiving and performing at the moment but also what they have seen and learned from the past during the search dynamics. The promising results indicate that our information scent models with different input variations were better, compared to the state-of-the art neural click models, at predicting some search behaviors. When incorporating the knowledge from a previous query in the same search session, the prediction of current query abandonment, pagination, and information gain has been improved. Compared to the well known neural click models that model search behaviors under a single search query thread, this study takes a broader view to consider an entire search session which may contain multiple queries. More importantly, our model takes the search result relevance pattern on the Search Engine Results Pages (SERP) as a whole as the information scent input to the deep learning model, instead of considering one search result at each step. The results have insights on the impact of information scents on how people forage for information, which has implications for designing or refining a set of design guidelines for search engines.","PeriodicalId":173154,"journal":{"name":"Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126853292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcelo G. Armentano, E. Bagheri, Julia Kiseleva, Frank W. Takes
{"title":"MAISoN 2019: The 3rd International Workshop on Mining Actionable Insights from Social Networks","authors":"Marcelo G. Armentano, E. Bagheri, Julia Kiseleva, Frank W. Takes","doi":"10.1145/3341981.3350529","DOIUrl":"https://doi.org/10.1145/3341981.3350529","url":null,"abstract":"A lot of research in social network mining is concerned with theories and methodologies for community discovery, pattern detection and network evolution, as well as behavioural analysis and anomaly (misbehaviour) detection. The MAISoN workshop focuses on the use of social network data and methods for building predictive models that can be used to uncover hidden and unexpected aspects of user-generated content in order to extract actionable insights. The objective is to explore ways in which insights can be transformed into effective actions that can help organizations improve and refine their activities. Thus, the focus is on social network analysis and mining techniques for gaining actionable real-world insights. The 3rd International Workshop on Mining Actionable Insights from Social Networks (MAISoN 2019) was a half day workshop co-located with ICTIR 2019, the 5th ACM SIGIR International Conference on the Theory of Information Retrieval which took place from October 2 to 5, 2019 in Santa Clara, California, United States.","PeriodicalId":173154,"journal":{"name":"Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132229130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anirban Chakraborty, Debasis Ganguly, A. Caputo, S. Lawless
{"title":"A Factored Relevance Model for Contextual Point-of-Interest Recommendation","authors":"Anirban Chakraborty, Debasis Ganguly, A. Caputo, S. Lawless","doi":"10.1145/3341981.3344230","DOIUrl":"https://doi.org/10.1145/3341981.3344230","url":null,"abstract":"The challenge of providing personalized and contextually appropriate recommendations to a user is faced in a range of use-cases, e.g., recommendations for movies, places to visit, articles to read etc. In this paper, we focus on one such application, namely that of suggesting 'points of interest' (POIs) to a user given her current location, by leveraging relevant information from her past preferences. An automated contextual recommendation algorithm is likely to work well if it can extract information from the preference history of a user (exploitation) and effectively combine it with information from the user's current context (exploration) to predict an item's 'usefulness' in the new context. To balance this trade-off between exploration and exploitation, we propose a generic unsupervised framework involving a factored relevance model (FRLM), comprising two distinct components, one corresponding to the historical information from past contexts, and the other pertaining to the information from the local context. Our experiments are conducted on the TREC contextual suggestion (TREC-CS) 2016 dataset. The results of our experiments demonstrate the effectiveness of our proposed approach in comparison to a number of standard IR and recommender-based baselines.","PeriodicalId":173154,"journal":{"name":"Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130840365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning a Better Negative Sampling Policy with Deep Neural Networks for Search","authors":"Daniel Cohen, Scott M. Jordan, W. Bruce Croft","doi":"10.1145/3341981.3344220","DOIUrl":"https://doi.org/10.1145/3341981.3344220","url":null,"abstract":"In information retrieval, sampling methods used to select documents for neural models must often deal with large class imbalances during training. This issue necessitates careful selection of negative instances when training neural models to avoid the risk of overfitting. For most work, heuristic sampling approaches, or policies, are created based off of domain experts, such as choosing samples with high BM25 scores or a random process over candidate documents. However, these sampling approaches are done with the test distribution in mind. In this paper, we demonstrate that the method chosen to sample negative documents during training plays a critical role in both the stability of training, as well as overall performance. Furthermore, we establish that using reinforcement learning to optimize a policy over a set of sampling functions can significantly improve performance over standard training practices with respect to IR metrics and is robust to hyperparameters and random seeds.","PeriodicalId":173154,"journal":{"name":"Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"176 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114615598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Utilizing Passages in Fusion-based Document Retrieval","authors":"Haggai Roitman, Y. Mass","doi":"10.1145/3341981.3344212","DOIUrl":"https://doi.org/10.1145/3341981.3344212","url":null,"abstract":"The usage of passage-level information has been successfully demonstrated in many core IR tasks, and among such tasks, the task of passage-based document retrieval. In this work, we study the merits of utilizing similar information for the fusion-based document retrieval task. Overall, we show that such information can be highly useful for this task as well. To this end, we propose three passage-based fusion methods and show that their performance can transcend that of strong document-level fusion methods.","PeriodicalId":173154,"journal":{"name":"Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114485531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Study of Query Performance Prediction for Answer Quality Determination","authors":"Haggai Roitman, Shai Erera, Guy Feigenblat","doi":"10.1145/3341981.3344219","DOIUrl":"https://doi.org/10.1145/3341981.3344219","url":null,"abstract":"We study a constrained retrieval setting in which either a single qualitative answer is provided as a response to a user-query or none. Given a user-query and the \"best\" answer that was retrieved from the underlying search engine, we wish to determine whether or not to accept it. To address this challenge, we propose an answer quality determination approach which leverages a novel set of answer-level query performance prediction (QPP) features, derived from a couple of recent discriminative QPP frameworks. Using various search benchmarks with both ad-hoc retrieval and non-factoid question answering (QA) tasks, we demonstrate the effectiveness of our approach.","PeriodicalId":173154,"journal":{"name":"Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"196 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116145685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Listwise Neural Ranking Models","authors":"Razieh Rahimi, Ali Montazeralghaem, J. Allan","doi":"10.1145/3341981.3344245","DOIUrl":"https://doi.org/10.1145/3341981.3344245","url":null,"abstract":"Several neural networks have been developed for end-to-end training of information retrieval models. These networks differ in many aspects including architecture, training data, data representations, and loss functions. However, only pointwise and pairwise loss functions are employed in training of end-to-end neural ranking models without human-engineered features. These loss functions do not consider the ranks of documents in the estimation of loss over training data. Because of this limitation, conventional learning-to-rank models using pointwise or pairwise loss functions have generally shown lower performance compared to those using listwise loss functions. Following this observation, we propose to employ listwise loss functions for the training of neural ranking models. We empirically demonstrate that a listwise neural ranker outperforms a pairwise neural ranking model. In addition, we achieve further improvements in the performance of the listwise neural ranking models by query-based sampling of training data.","PeriodicalId":173154,"journal":{"name":"Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125548707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leandro von Werra, Marcel Schöngens, E. Uzun, Carsten Eickhoff
{"title":"Generative Adversarial Networks in Precision Oncology","authors":"Leandro von Werra, Marcel Schöngens, E. Uzun, Carsten Eickhoff","doi":"10.1145/3341981.3344238","DOIUrl":"https://doi.org/10.1145/3341981.3344238","url":null,"abstract":"Precision medicine strives to deliver improved care based on genetic patient information. Towards this end, it is crucial to find effective data representations on which to perform matching and inference operations. We develop and evaluate a generative adversarial neural network (GAN) approach to representation learning with the goal of patient-centric literature retrieval and treatment recommendation in precision oncology. Several large-scale corpora including the COSMIC Cancer Gene Census, COSMIC Mutation Data, Genomic Data Commons (GDC) and 26M MEDLINE abstracts are used to train GANs for synthesizing genetic mutation patterns that likely correspond to patient properties such as their demographics or cancer type. The introduction of GANs into the literature retrieval and treatment recommendation process results in significant improvements in performance by increasing the recall of a range of methods at stable precision. Finally, we propose a method to discover novel gene-gene interaction hypotheses to guide future research.","PeriodicalId":173154,"journal":{"name":"Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126282050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical Significance Testing in Theory and in Practice","authors":"Ben Carterette","doi":"10.1145/3341981.3358959","DOIUrl":"https://doi.org/10.1145/3341981.3358959","url":null,"abstract":"The past 25 years have seen a great improvement in the rigor of experimentation on information access problems. This is due primarily to three factors: high-quality, public, portable test collections such as those produced by TREC (the Text REtreval Conference~citetrecbook ), the increased ease of online A/B testing on large user populations, and the increased practice of statistical hypothesis testing to determine whether observed improvements can be ascribed to something other than random chance. Together these create a very useful standard for reviewers, program committees, and journal editors; work on information access (IA) problems such as search and recommendation increasingly cannot be published unless it has been evaluated offline using a well-constructed test collection or online on a large user base and shown to produce a statistically significant improvement over a good baseline. But, as the saying goes, any tool sharp enough to be useful is also sharp enough to be dangerous. Statistical tests of significance are widely misunderstood. Most researchers and developers treat them as a \"black box'': evaluation results go in and a p-value comes out. But because significance is such an important factor in determining what directions to explore and what is published or deployed, using p-values obtained without thought can have consequences for everyone working in IA. Ioannidis has argued that the main consequence in the biomedical sciences is that most published research findings are false; could that be the case for IA as well?","PeriodicalId":173154,"journal":{"name":"Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114442169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}