Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval最新文献

筛选
英文 中文
Learning Aligned Cross-Modal and Cross-Product Embeddings for Generating the Topics of Shopping Needs 学习对齐的跨模态和跨产品嵌入来生成购物需求主题
Yi-Ru Tsai, Pu-Jen Cheng
{"title":"Learning Aligned Cross-Modal and Cross-Product Embeddings for Generating the Topics of Shopping Needs","authors":"Yi-Ru Tsai, Pu-Jen Cheng","doi":"10.1145/3578337.3605133","DOIUrl":"https://doi.org/10.1145/3578337.3605133","url":null,"abstract":"The paper addresses the issue of generating keywords to describe the topic of a shopping need based on the titles and photos of products being browsed or compared. We extend to learn cross-modal and cross-product embeddings to capture the relationships between textual and visual semantics and the shared features between comparable products. Experiments conducted on 3 real-world datasets have shown that the keywords decoded from such embeddings gain significant improvement compared to state-of-the-art cross-modal embeddings.","PeriodicalId":415621,"journal":{"name":"Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127489307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KALE: Using a K-Sparse Projector for Lexical Expansion 使用k -稀疏投影仪进行词汇扩展
Luís Borges, Bruno Martins, Jamie Callan
{"title":"KALE: Using a K-Sparse Projector for Lexical Expansion","authors":"Luís Borges, Bruno Martins, Jamie Callan","doi":"10.1145/3578337.3605131","DOIUrl":"https://doi.org/10.1145/3578337.3605131","url":null,"abstract":"Recent research has proposed retrieval approaches based on sparse representations and inverted indexes, with terms produced by neural language models and leveraging the advantages from both neural retrieval and lexical matching. This paper proposes KALE, a new lightweight method of this family that uses a small model with a k-sparse projector to convert dense representations into a sparse set of entries from a latent vocabulary. The KALE vocabulary captures semantic concepts than perform well when used in isolation, and perform better when extending the original lexical vocabulary, this way improving first-stage retrieval accuracy. Experiments with the MSMARCOv1 passage retrieval dataset, the TREC Deep Learning dataset, and BEIR datasets, examined the effectiveness of KALE under varying conditions. Results show that the KALE terms can replace the original lexical vocabulary, with gains in accuracy and efficiency. Combining KALE with the original lexical vocabulary, or with other learned terms, can further improve retrieval accuracy with only a modest increase in computational cost.","PeriodicalId":415621,"journal":{"name":"Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131512595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Theoretical Analysis of Out-of-Distribution Detection in Multi-Label Classification 多标签分类中分布外检测的理论分析
Dell Zhang, Bilyana Taneva-Popova
{"title":"A Theoretical Analysis of Out-of-Distribution Detection in Multi-Label Classification","authors":"Dell Zhang, Bilyana Taneva-Popova","doi":"10.1145/3578337.3605116","DOIUrl":"https://doi.org/10.1145/3578337.3605116","url":null,"abstract":"The ability to detect out-of-distribution (OOD) inputs is essential for safely deploying machine learning models in an open world. Most existing research on OOD detection, and more generally uncertainty quantification, has focused on multi-class classification. However, for many information retrieval (IR) applications, the classification of documents or images is by nature not multi-class but multi-label. This paper presents a pure theoretical analysis of the under-explored problem of OOD detection in multi-label classification using deep neural networks. First, we examine main existing approaches such as MSP (proposed in ICLR-2017) and MaxLogit (proposed in ICML-2022), and summarize them as different combinations of label-wise scoring and aggregation functions. Some existing methods are shown to be equivalent. Then, we prove that JointEnergy (proposed in NeurIPS-2021) is indeed the optimal probabilistic solution when the class labels are conditionally independent with each other for any given data sample. This provides a more rigorous explanation for the effectiveness of JointEnergy than the original joint-likelihood interpretation, and also reveals its reliance upon the assumption of label independence rather than the exploitation of label relationships as previously thought. Finally, we discuss potential future research directions in this area.","PeriodicalId":415621,"journal":{"name":"Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133083828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CSurF: Sparse Lexical Retrieval through Contextualized Surface Forms 基于上下文化表面形式的稀疏词汇检索
Zhen Fan, Luyu Gao, Jamie Callan
{"title":"CSurF: Sparse Lexical Retrieval through Contextualized Surface Forms","authors":"Zhen Fan, Luyu Gao, Jamie Callan","doi":"10.1145/3578337.3605126","DOIUrl":"https://doi.org/10.1145/3578337.3605126","url":null,"abstract":"Lexical exact-match systems perform text retrieval efficiently with sparse matching signals and fast retrieval through inverted lists, but naturally suffer from the mismatch between lexical surface form and implicit term semantics. This paper proposes to directly bridge the surface form space and the term semantics space in lexical exact-match retrieval via contextualized surface forms (CSF). Each CSF pairs a lexical surface form with a context source, and is represented by a lexical form weight and a contextualized semantic vector representation. This framework is able to perform sparse lexicon-based retrieval by learning to represent each query and document as a \"bag-of-CSFs\", simultaneously addressing two key factors in sparse retrieval: vocabulary expansion of surface form and semantic representation of term meaning. At retrieval time, it efficiently matches CSFs through exact-match of learned surface forms, and effectively scores each CSF pair via contextual semantic representations, leading to joint improvement in both term match and term scoring. Multiple experiments show that this approach successfully resolves the main mismatch issues in lexical exact-match retrieval and outperforms state-of-the-art lexical exact-match systems, reaching comparable accuracy as lexical all-to-all soft match systems as an efficient exact-match-based system.","PeriodicalId":415621,"journal":{"name":"Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125536877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Balanced Knowledge Distillation with Contrastive Learning for Document Re-ranking 平衡知识精馏与对比学习的文献重排序
Yingrui Yang, Shanxiu He, Yifan Qiao, Wentai Xie, Tao Yang
{"title":"Balanced Knowledge Distillation with Contrastive Learning for Document Re-ranking","authors":"Yingrui Yang, Shanxiu He, Yifan Qiao, Wentai Xie, Tao Yang","doi":"10.1145/3578337.3605120","DOIUrl":"https://doi.org/10.1145/3578337.3605120","url":null,"abstract":"Knowledge distillation is commonly used in training a neural document ranking model by employing a teacher to guide model refinement. As a teacher may not be correct in all cases, over-calibration between the student and teacher models can make training less effective. This paper focuses on the KL divergence loss used for knowledge distillation in document re-ranking, and re-visits balancing of knowledge distillation with explicit contrastive learning. The proposed loss function takes a conservative approach in imitating teacher's behavior, and allows student to deviate from a teacher's model sometimes through training. This paper presents analytic results with an evaluation on MS MARCO passages to validate the usefulness of the proposed loss for the transformer-based ColBERT re-ranking.","PeriodicalId":415621,"journal":{"name":"Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"213 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131700012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Query Performance Prediction for Neural Information Retrieval: Challenges and Opportunities 面向神经信息检索的查询性能预测:挑战与机遇
G. Faggioli, Thibault Formal, Simon Lupart, S. Marchesin, S. Clinchant, N. Ferro, Benjamin Piwowarski
{"title":"Towards Query Performance Prediction for Neural Information Retrieval: Challenges and Opportunities","authors":"G. Faggioli, Thibault Formal, Simon Lupart, S. Marchesin, S. Clinchant, N. Ferro, Benjamin Piwowarski","doi":"10.1145/3578337.3605142","DOIUrl":"https://doi.org/10.1145/3578337.3605142","url":null,"abstract":"In this work, we propose a novel framework to devise features that can be used by Query Performance Prediction (QPP) models for Neural Information Retrieval (NIR). Using the proposed framework as a periodic table of QPP components, practitioners can devise new predictors better suited for NIR. Through the framework, we detail what challenges and opportunities arise for QPPs at different stages of the NIR pipeline. We show the potential of the proposed framework by using it to devise two types of novel predictors. The first one, named MEMory-based QPP (MEM-QPP), exploits the similarity between test and train queries to measure how much a NIR system can memorize. The second adapts traditional QPPs into NIR-oriented ones by computing the query-corpus semantic similarity. By exploiting the inherent nature of NIR systems, the proposed predictors overcome, under various setups, the current State of the Art, highlighting -- at the same time -- the versatility of the framework in describing different types of QPPs.","PeriodicalId":415621,"journal":{"name":"Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114768997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Learn to be Fair without Labels: A Distribution-based Learning Framework for Fair Ranking 学会没有标签的公平:基于分配的公平排名学习框架
F. Chen, Hui Fang
{"title":"Learn to be Fair without Labels: A Distribution-based Learning Framework for Fair Ranking","authors":"F. Chen, Hui Fang","doi":"10.1145/3578337.3605132","DOIUrl":"https://doi.org/10.1145/3578337.3605132","url":null,"abstract":"Ranking algorithms as an essential component of retrieval systems have been constantly improved in previous studies, especially regarding relevance-based utilities. In recent years, more and more research attempts have been proposed regarding fairness in rankings due to increasing concerns about potential discrimination and the issue of echo chamber. These attempts include traditional score-based methods that allocate exposure resources to different groups using pre-defined scoring functions or selection strategies and learning-based methods that learn the scoring functions based on data samples. Learning-based models are more flexible and achieve better performance than traditional methods. However, most of the learning-based models were trained and tested on outdated datasets where fairness labels are barely available. State-of-art models utilize relevance-based utility scores as a substitute for the fairness labels to train their fairness-aware loss, where plugging in the substitution does not guarantee the minimum loss. This inconsistency challenges the model's accuracy and performance, especially when learning is achieved by gradient descent. Hence, we propose a distribution-based fair learning framework (DLF) that does not require labels by replacing the unavailable fairness labels with target fairness exposure distributions. Experimental studies on TREC fair ranking track dataset confirm that our proposed framework achieves better fairness performance while maintaining better control over the fairness-relevance trade-off than state-of-art fair ranking frameworks.","PeriodicalId":415621,"journal":{"name":"Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114817767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Retrieving Webpages Using Online Discussions 使用在线讨论检索网页
Kevin Ros, Matthew Jin, Jacob Levine, ChengXiang Zhai
{"title":"Retrieving Webpages Using Online Discussions","authors":"Kevin Ros, Matthew Jin, Jacob Levine, ChengXiang Zhai","doi":"10.1145/3578337.3605139","DOIUrl":"https://doi.org/10.1145/3578337.3605139","url":null,"abstract":"Online discussions are a ubiquitous aspect of everyday life. An Internet user who interacts with an online discussion may benefit from seeing hyperlinks to webpages relevant to the discussion because the relevant webpages can provide added context, act as citations for background sources, or condense information so that conversations can proceed seamlessly at a high level. In this paper, we propose and study a new task of retrieving relevant webpages given an online discussion. We frame the task as a novel retrieval problem where we treat a sequence of comments in an online discussion as a query and use such a query to retrieve relevant webpages. We construct a new data set using Reddit, an online discussion forum, to study this new problem. We explore and evaluate multiple representative retrieval methods to examine their effectiveness for solving this new problem. We also propose to leverage the comments that contain hyperlinks as training data to enable supervised learning and further improve retrieval performance. We find that results using modern retrieval methods are promising and that leveraging comments with hyperlinks as training data can further improve performance. We release our data set and code to enable additional research in this direction.","PeriodicalId":415621,"journal":{"name":"Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124401396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clarifying Questions in Math Information Retrieval 澄清数学信息检索中的问题
Behrooz Mansouri, Zahra Jahedibashiz
{"title":"Clarifying Questions in Math Information Retrieval","authors":"Behrooz Mansouri, Zahra Jahedibashiz","doi":"10.1145/3578337.3605123","DOIUrl":"https://doi.org/10.1145/3578337.3605123","url":null,"abstract":"One of the challenges of math information retrieval is the inherent ambiguity of mathematical notation. The use of various notations, symbols, and conventions can lead to ambiguities in math search queries, potentially causing confusion and errors. Therefore, asking clarifying questions in math search can help remove these ambiguities. Despite advances in incorporating clarifying questions for search, little is currently understood about the characteristics of these questions in math. This paper investigates math clarifying questions asked on the MathStackExchange community question answering platform, analyzing a total of 495,431 clarifying questions and their usefulness. The results of the analysis uncover specific patterns in useful clarifying questions that provide insight into the design considerations for future conversational math search systems. The formulae used in clarifying questions are closely related to those in the initial queries and are accompanied by common phrases, seeking for the missing information related to the formulae. Additionally, experiments utilizing clarifying questions for math search demonstrate the potential benefits of incorporating them alongside the original query.","PeriodicalId":415621,"journal":{"name":"Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"229 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115588697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Hint Generation 自动提示生成
A. Jatowt, Calvin Gehrer, Michael Färber
{"title":"Automatic Hint Generation","authors":"A. Jatowt, Calvin Gehrer, Michael Färber","doi":"10.1145/3578337.3605119","DOIUrl":"https://doi.org/10.1145/3578337.3605119","url":null,"abstract":"At times when answers to user questions are readily and easily available (at essentially zero cost), it is important for humans to maintain their knowledge and strong reasoning capabilities. We believe that in many cases providing hints rather than final answers should be sufficient and beneficial for users as it requires thinking and stimulates learning as well as remembering processes. We propose in this paper a novel task of automatic hint generation that supports users in finding the correct answers to their questions without the need of looking the answers up. As the first attempt towards this new task, we design and implement an approach that uses Wikipedia to automatically provide hints for any input question-answer pair. We then evaluate our approach with a user group of 10 persons and demonstrate that the generated hints help users successfully answer more questions than when provided with baseline hints.","PeriodicalId":415621,"journal":{"name":"Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"265 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124212084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信