Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval最新文献

The 8th Symposium on Future Directions in Information Access 第八届信息获取未来方向研讨会

Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval Pub Date : 2018-09-10 DOI: 10.1145/3234944.3237193

Haiming Liu, Ingo Frommholz, I. Schmitt, D. Song

引用次数: 0

Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval 2018 ACM SIGIR信息检索理论国际会议论文集

Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval Pub Date : 2018-09-10 DOI: 10.1145/3234944

引用次数: 2

Topic Set Size Design for Paired and Unpaired Data 配对和非配对数据的主题集大小设计

Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval Pub Date : 2018-09-10 DOI: 10.1145/3234944.3234971

T. Sakai

{"title":"Topic Set Size Design for Paired and Unpaired Data","authors":"T. Sakai","doi":"10.1145/3234944.3234971","DOIUrl":"https://doi.org/10.1145/3234944.3234971","url":null,"abstract":"Topic set size design is an approach to determining the sample sizes of an experiment (e.g., number of topics) based on a statistical requirement, namely a desired statistical power or a cap on the confidence interval (CI) width for the difference in means. Previous work considered paired data cases for a desired power of the t-test and for a cap on CI width, as well as unpaired data cases for a desired power of one-way ANOVA. In the present study, we consider unpaired (i.e., two-sample) cases for the t-test and for the CI width. Since one-way ANOVA with two groups is strictly equivalent to the two-sample t-test, we compare the outcomes of the topic set size design results based on these two approaches, and show that the one-way ANOVA-based approach actually returns tighter sample sizes than the two-sample t-test approach. Moreover, we compare the paired and unpaired cases for both t-test-based and CI-based topic set size design approaches. Because estimating the variance of the score differences for the paired data setting is problematic, we recommend the use of our unpaired-data versions of t-test-based and CI-based topic set size design tools, as they only require a variance estimate for individual scores and the appropriate sample sizes for unpaired data are also large enough for paired data.","PeriodicalId":193631,"journal":{"name":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125547153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Enhanced Performance Prediction of Fusion-based Retrieval 基于融合的检索增强性能预测

Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval Pub Date : 2018-09-10 DOI: 10.1145/3234944.3234950

Haggai Roitman

引用次数: 9

Using PageRank for Characterizing Topic Quality in LDA 基于PageRank的LDA主题质量表征

Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval Pub Date : 2018-09-10 DOI: 10.1145/3234944.3234955

Sujatha Das Gollapalli, Xiaoli Li

{"title":"Using PageRank for Characterizing Topic Quality in LDA","authors":"Sujatha Das Gollapalli, Xiaoli Li","doi":"10.1145/3234944.3234955","DOIUrl":"https://doi.org/10.1145/3234944.3234955","url":null,"abstract":"Topic models based on Latent Dirichlet Allocation (LDA) are employed effectively in various information retrieval and data mining tasks. Despite their popularity and wide-spread application, the question of assessing the quality of topics extracted by LDA models is still not completely resolved. While various measures have been proposed to quantify the thematic coherence and interpretability of a topic extracted by LDA, they do not address this problem sufficiently. We observe that existing quality measures select top topic words based on their topic-word co-occurrence without considering word co-occurrences within the same context. We incorporate precisely this information by constructing topic-specific graphs capturing neighborhood of words in an LDA modeled corpus. Next, the PageRank algorithm is applied on these graphs to assign word importance scores based on centrality. We propose two measures to compute topic quality: (1) the Aggregate PageRank of Top-words of a topic and (2) the PageRank Centralization Index of a topic-specific word graph. Our experiments across three datasets show that unlike existing quality measures, our proposed measures are able to identify topics that are discriminative as well as interpretable and yield superior performance on both classification and intruder word identification tasks.","PeriodicalId":193631,"journal":{"name":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"90 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120843301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Beyond Greedy Search: Pruned Exhaustive Search for Diversified Result Ranking 超越贪婪搜索:多样化结果排序的精简穷举搜索

Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval Pub Date : 2018-09-10 DOI: 10.1145/3234944.3234967

Yingying Wu, Yiqun Liu, Fei Chen, Min Zhang, Shaoping Ma

引用次数: 3

Utilizing Pseudo-Relevance Feedback in Fusion-based Retrieval 伪相关反馈在融合检索中的应用

Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval Pub Date : 2018-09-10 DOI: 10.1145/3234944.3234969

Haggai Roitman

引用次数: 3

Pseudo Descriptions for Meta-Data Retrieval 元数据检索的伪描述

Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval Pub Date : 2018-09-10 DOI: 10.1145/3234944.3234957

Tim Gollub, E. Genc, Nedim Lipka, Benno Stein

{"title":"Pseudo Descriptions for Meta-Data Retrieval","authors":"Tim Gollub, E. Genc, Nedim Lipka, Benno Stein","doi":"10.1145/3234944.3234957","DOIUrl":"https://doi.org/10.1145/3234944.3234957","url":null,"abstract":"Search in meta-data is challenging due to the sparsity of the available textual information. To alleviate the sparsity problem, the paper in hand evolves from the existing document expansion paradigm and proposes pseudo-descriptions as a new paradigm. Instead of encoding paradigmatic term relations implicitly in an expansion vector, we generate an explicit cohesive text field for meta-data records that describes the entity associated with the record. In contrast to document expansions, pseudo-descriptions allow to reveal why a certain document is considered relevant although the original meta-data does not contain the query terms. Moreover, they are easier to operationalize and facilitate the use of sophisticated retrieval features such as phrase search and query term proximity. To generate pseudo-descriptions, we propose a relevance dependent strategy that depends on the search engine result pages obtained from issuing the meta-data as a search query to a designated reference search engine. To demonstrate the validity of the pseudo-description paradigm, we experiment with different TREC collections where we withhold the content information to simulate a meta-data retrieval scenario. Though retrieval with full content information remains superior, our approach achieves retrieval performance improvements en par with document expansion.","PeriodicalId":193631,"journal":{"name":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114312499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Entity Retrieval in the Knowledge Graph with Hierarchical Entity Type and Content 具有层次实体类型和内容的知识图谱中的实体检索

Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval Pub Date : 2018-09-10 DOI: 10.1145/3234944.3234963

Xinshi Lin, Wai Lam, K. Lai

引用次数: 6

An Adaptive Recommender System for Computational Serendipity 计算偶然性的自适应推荐系统

Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval Pub Date : 2018-09-10 DOI: 10.1145/3234944.3234974

Xi Niu

引用次数: 8