{"title":"The 8th Symposium on Future Directions in Information Access","authors":"Haiming Liu, Ingo Frommholz, I. Schmitt, D. Song","doi":"10.1145/3234944.3237193","DOIUrl":"https://doi.org/10.1145/3234944.3237193","url":null,"abstract":"The 8th PhD Symposium on Future Directions in Information Access (FDIA) will be held in conjunction with the 8th International Conference on the Theory of Information Retrieval (ICTIR 2018) in Tianjin, China. The symposium aims to provide a forum for early stage researchers such as PhD students, to share their research and interact with each other and senior researchers in an informal and relaxed atmosphere. The symposium provides an excellent opportunity for the participants to promote their work and obtain experience in presenting and communicating their research. The participants will learn about different topics in the area of information access and retrieval, receive feedback on their work, meet lots of peers and hear inspiring talks.","PeriodicalId":193631,"journal":{"name":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115711753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","authors":"","doi":"10.1145/3234944","DOIUrl":"https://doi.org/10.1145/3234944","url":null,"abstract":"","PeriodicalId":193631,"journal":{"name":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133639261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Topic Set Size Design for Paired and Unpaired Data","authors":"T. Sakai","doi":"10.1145/3234944.3234971","DOIUrl":"https://doi.org/10.1145/3234944.3234971","url":null,"abstract":"Topic set size design is an approach to determining the sample sizes of an experiment (e.g., number of topics) based on a statistical requirement, namely a desired statistical power or a cap on the confidence interval (CI) width for the difference in means. Previous work considered paired data cases for a desired power of the t-test and for a cap on CI width, as well as unpaired data cases for a desired power of one-way ANOVA. In the present study, we consider unpaired (i.e., two-sample) cases for the t-test and for the CI width. Since one-way ANOVA with two groups is strictly equivalent to the two-sample t-test, we compare the outcomes of the topic set size design results based on these two approaches, and show that the one-way ANOVA-based approach actually returns tighter sample sizes than the two-sample t-test approach. Moreover, we compare the paired and unpaired cases for both t-test-based and CI-based topic set size design approaches. Because estimating the variance of the score differences for the paired data setting is problematic, we recommend the use of our unpaired-data versions of t-test-based and CI-based topic set size design tools, as they only require a variance estimate for individual scores and the appropriate sample sizes for unpaired data are also large enough for paired data.","PeriodicalId":193631,"journal":{"name":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125547153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhanced Performance Prediction of Fusion-based Retrieval","authors":"Haggai Roitman","doi":"10.1145/3234944.3234950","DOIUrl":"https://doi.org/10.1145/3234944.3234950","url":null,"abstract":"We study the query performance prediction (QPP) task for fusion-based retrieval. Within such a retrieval setting, several ranked lists, each one retrieved by a different method, are combined into a single (fused) ranked list. A common prediction approach is to treat the (base) ranked lists as reference lists and combine those lists' QPP estimates according to their similarity with the fused-list. Yet, we identify a gap in the way that relevance-dependent aspects of inter-list relationships are modeled within such an approach. Aiming to address this gap, we derive an enhanced estimation approach which results in a more accurate prediction.","PeriodicalId":193631,"journal":{"name":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130207917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using PageRank for Characterizing Topic Quality in LDA","authors":"Sujatha Das Gollapalli, Xiaoli Li","doi":"10.1145/3234944.3234955","DOIUrl":"https://doi.org/10.1145/3234944.3234955","url":null,"abstract":"Topic models based on Latent Dirichlet Allocation (LDA) are employed effectively in various information retrieval and data mining tasks. Despite their popularity and wide-spread application, the question of assessing the quality of topics extracted by LDA models is still not completely resolved. While various measures have been proposed to quantify the thematic coherence and interpretability of a topic extracted by LDA, they do not address this problem sufficiently. We observe that existing quality measures select top topic words based on their topic-word co-occurrence without considering word co-occurrences within the same context. We incorporate precisely this information by constructing topic-specific graphs capturing neighborhood of words in an LDA modeled corpus. Next, the PageRank algorithm is applied on these graphs to assign word importance scores based on centrality. We propose two measures to compute topic quality: (1) the Aggregate PageRank of Top-words of a topic and (2) the PageRank Centralization Index of a topic-specific word graph. Our experiments across three datasets show that unlike existing quality measures, our proposed measures are able to identify topics that are discriminative as well as interpretable and yield superior performance on both classification and intruder word identification tasks.","PeriodicalId":193631,"journal":{"name":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"90 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120843301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yingying Wu, Yiqun Liu, Fei Chen, Min Zhang, Shaoping Ma
{"title":"Beyond Greedy Search: Pruned Exhaustive Search for Diversified Result Ranking","authors":"Yingying Wu, Yiqun Liu, Fei Chen, Min Zhang, Shaoping Ma","doi":"10.1145/3234944.3234967","DOIUrl":"https://doi.org/10.1145/3234944.3234967","url":null,"abstract":"As a search query can correspond to multiple intents, search result diversification aims at returning a single result list that could satisfy as many users' information needs as possible. However, determining the optimal ranking list is NP-hard. Several algorithms have been proposed to obtain a local optimal ranking with greedy approximations. In this paper, we propose a pruned exhaustive method to generate better solutions than the greedy search. Our approach is based on the observations that there are fewer than ten subtopics for most queries, most relevant results cover only a few subtopics, and most search users only focus on the top results. The proposed pruned exhaustive search algorithm based on ordered pairs (PesOP) finds the optimal solution efficiently. Experimental results based on TREC Diversity and NTCIR Intent task datasets show that PesOP outperforms greedy strategies with better diversification performance. Compared with the original non-pruned exhaustive search, the PesOP algorithm decreases the computational cost while maintaining optimality.","PeriodicalId":193631,"journal":{"name":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128163318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Utilizing Pseudo-Relevance Feedback in Fusion-based Retrieval","authors":"Haggai Roitman","doi":"10.1145/3234944.3234969","DOIUrl":"https://doi.org/10.1145/3234944.3234969","url":null,"abstract":"The usage of positive relevance feedback in fusion-based retrieval was previously shown to be very useful. Yet, in many retrieval use-cases, no actual relevance feedback may be available. With the absence of relevance data, pseudo-relevance feedback models have been suggested as an alternative. Encouraged by the previous success of using positive relevance feedback in fusion-based retrieval, in this work, we study the usage of pseudo-relevance feedback in this setting as well. We build on top of an existing approach that was originally designed for utilizing positive relevance feedback and adapt it to pseudo-relevance feedback. To this end, we propose a novel approach for estimating document (pseudo) relevance labels. Our labeling approach is better tailored to the fusion-based retrieval setting and provides favorable retrieval quality results.","PeriodicalId":193631,"journal":{"name":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115839700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pseudo Descriptions for Meta-Data Retrieval","authors":"Tim Gollub, E. Genc, Nedim Lipka, Benno Stein","doi":"10.1145/3234944.3234957","DOIUrl":"https://doi.org/10.1145/3234944.3234957","url":null,"abstract":"Search in meta-data is challenging due to the sparsity of the available textual information. To alleviate the sparsity problem, the paper in hand evolves from the existing document expansion paradigm and proposes pseudo-descriptions as a new paradigm. Instead of encoding paradigmatic term relations implicitly in an expansion vector, we generate an explicit cohesive text field for meta-data records that describes the entity associated with the record. In contrast to document expansions, pseudo-descriptions allow to reveal why a certain document is considered relevant although the original meta-data does not contain the query terms. Moreover, they are easier to operationalize and facilitate the use of sophisticated retrieval features such as phrase search and query term proximity. To generate pseudo-descriptions, we propose a relevance dependent strategy that depends on the search engine result pages obtained from issuing the meta-data as a search query to a designated reference search engine. To demonstrate the validity of the pseudo-description paradigm, we experiment with different TREC collections where we withhold the content information to simulate a meta-data retrieval scenario. Though retrieval with full content information remains superior, our approach achieves retrieval performance improvements en par with document expansion.","PeriodicalId":193631,"journal":{"name":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114312499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Entity Retrieval in the Knowledge Graph with Hierarchical Entity Type and Content","authors":"Xinshi Lin, Wai Lam, K. Lai","doi":"10.1145/3234944.3234963","DOIUrl":"https://doi.org/10.1145/3234944.3234963","url":null,"abstract":"We investigate the task of ad-hoc entity retrieval from a knowledge graph with hierarchical entity types and entity descriptions. Our model directly encodes them into a Markov random field based framework via a path aware smoothing method. We conduct experiments on recent benchmark datasets and investigate the incorporation of the Wikipedia type and article information. The results show that our framework achieves improvements over the existing and state-of-the-art models.","PeriodicalId":193631,"journal":{"name":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122761373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Adaptive Recommender System for Computational Serendipity","authors":"Xi Niu","doi":"10.1145/3234944.3234974","DOIUrl":"https://doi.org/10.1145/3234944.3234974","url":null,"abstract":"Serendipity is recognized as very challenging to simulate and stimulate in recommender systems. In this paper, we adopt a novel approach to model and implement serendipity in a context of health news recommender system. The proposed conceptual framework for serendipity consists of a surprise component, a value component, and a learning component. The three components work together to reason about what information is serendipitous, defined as both surprising and valuable to a user. The implementation is through a series of computational approaches, resulting a prototype called \"StumbleOn\". We find that the computational approaches help identifying serendipitous recommendations, which are further improved by adaptively learning users' real-time feedback. This study contributes to the research on how to generate serendipity for users in a predictable and systematic way.","PeriodicalId":193631,"journal":{"name":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114990991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}