{"title":"Reproducibility, Replicability, and Insights into Dense Multi-Representation Retrieval Models: from ColBERT to Col*","authors":"Xiao Wang, C. Macdonald, N. Tonellotto, I. Ounis","doi":"10.1145/3539618.3591916","DOIUrl":"https://doi.org/10.1145/3539618.3591916","url":null,"abstract":"Dense multi-representation retrieval models, exemplified as ColBERT, estimate the relevance between a query and a document based on the similarity of their contextualised token-level embeddings. Indeed, by using contextualised token embeddings, dense retrieval, conducted as either exact or semantic matches, can result in increased effectiveness for both in-domain and out-of-domain retrieval tasks, indicating that it is an important model to study. However, the exact role that these semantic matches play is not yet well investigated. For instance, although tokenisation is one of the crucial design choices for various pretrained language models, its impact on the matching behaviour has not been examined in detail. In this work, we inspect the reproducibility and replicability of the contextualised late interaction mechanism by extending ColBERT to Col⋆ which implements the late interaction mechanism across various pretrained models and different types of tokenisers. As different tokenisation methods can directly impact the matching behaviour within the late interaction mechanism, we study the nature of matches occurring in different Col⋆ models, and further quantify the contribution of lexical and semantic matching on retrieval effectiveness. Overall, our experiments successfully reproduce the performance of ColBERT on various query sets, and replicate the late interaction mechanism upon different pretrained models with different tokenisers. Moreover, our experimental results yield new insights, such as: (i) semantic matching behaviour varies across different tokenisers; (ii) more specifically, high-frequency tokens tend to perform semantic matching than other token families; (iii) late interaction mechanism benefits more from lexical matching than semantic matching; (iv) special tokens, such as [CLS], play a very important role in late interaction.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122214266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bridging Quantitative and Qualitative Digital Experience Testing","authors":"Ranjitha Kumar","doi":"10.1145/3539618.3591873","DOIUrl":"https://doi.org/10.1145/3539618.3591873","url":null,"abstract":"Digital user experiences are a mainstay of modern communication and commerce; multi-billion dollar industries have arisen around optimizing digital design. Usage analytics and A/B testing solutions allow growth hackers to quantitatively compute conversion over key user journeys, while user experience (UX) testing platforms enable UX researchers to qualitatively analyze usability and brand perception. Although these workflows are in pursuit of the same objective - producing better UX - the gulf between quantitative and qualitative testing is wide: they involve different stakeholders, and rely on disparate methodologies, budget, data streams, and software tools. This gap belies the opportunity to create a single platform that optimizes digital experiences holistically: using quantitative methods to uncover what and how much and qualitative analysis to understand why. Such a platform could monitor conversion funnels, identify anomalous behaviors, intercept live users exhibiting those behaviors, and solicit explicit feedback in situ. This feedback could take many forms: survey responses, screen recordings of participants performing tasks, think-aloud audio, and more. By combining data from multiple users and correlating across feedback types, the platform could surface not just insights that a particular conversion funnel had been affected, but hypotheses about what had caused the change in user behavior. The platform could then rank these insights by how often the observed behavior occurred in the wild, using large-scale analytics to contextualize the results from small-scale UX tests. To this end, a decade of research has focused on interaction mining: a set of techniques for capturing interaction and design data from digital artifacts, and aggregating these multimodal data streams into structured representations bridging quantitative and qualitative experience testing[1-4]. During user sessions, interaction mining systems record user interactions (e.g., clicks, scrolls, text input), screen captures, and render-time data structures (e.g., website DOMs, native app view hierarchies). Once captured, these data streams are aligned and combined into user traces, sequences of user interactions semanticized by the design data of their UI targets [5]. The structure of these traces affords new workflows for composing quantitative and qualitative methods, building toward a unified platform for optimizing digital experiences.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115314298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Uncertainty-aware Consistency Learning for Cold-Start Item Recommendation","authors":"Taichi Liu, Chen Gao, Zhenyu Wang, Dong Li, Jianye Hao, Depeng Jin, Yong Li","doi":"10.1145/3539618.3592078","DOIUrl":"https://doi.org/10.1145/3539618.3592078","url":null,"abstract":"Graph Neural Network (GNN)-based models have become the mainstream approach for recommender systems. Despite the effectiveness, they are still suffering from the cold-start problem, i.e., recommend for few-interaction items. Existing GNN-based recommendation models to address the cold-start problem mainly focus on utilizing auxiliary features of users and items, leaving the user-item interactions under-utilized. However, embeddings distributions of cold and warm items are still largely different, since cold items' embeddings are learned from lower-popularity interactions, while warm items' embeddings are from higher-popularity interactions. Thus, there is a seesaw phenomenon, where the recommendation performance for the cold and warm items cannot be improved simultaneously. To this end, we proposed a Uncertainty-aware Consistency learning framework for Cold-start item recommendation (shorten as UCC) solely based on user-item interactions. Under this framework, we train the teacher model (generator) and student model (recommender) with consistency learning, to ensure the cold items with additionally generated low-uncertainty interactions can have similar distribution with the warm items. Therefore, the proposed framework improves the recommendation of cold and warm items at the same time, without hurting any one of them. Extensive experiments on benchmark datasets demonstrate that our proposed method significantly outperforms state-of-the-art methods on both warm and cold items, with an average performance improvement of 27.6%.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"13 1-2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116779597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ning Pang, Xiang Zhao, Weixin Zeng, Ji Wang, W. Xiao
{"title":"Personalized Federated Relation Classification over Heterogeneous Texts","authors":"Ning Pang, Xiang Zhao, Weixin Zeng, Ji Wang, W. Xiao","doi":"10.1145/3539618.3591748","DOIUrl":"https://doi.org/10.1145/3539618.3591748","url":null,"abstract":"Relation classification detects the semantic relation between two annotated entities from a piece of text, which is a useful tool for structurization of knowledge. Recently, federated learning has been introduced to train relation classification models in decentralized settings. Current methods strive for a strong server model by decoupling the model training at server from direct access to texts at clients while taking advantage of them. Nevertheless, they overlook the fact that clients have heterogeneous texts (i.e., texts with diversely skewed distribution of relations), which renders existing methods less practical. In this paper, we propose to investigate personalized federated relation classification, in which strong client models adapted to their own data are desired. To further meet the challenges brought by heterogeneous texts, we present a novel framework, namely pf-RC, with several optimized designs. It features a knowledge aggregation method that exploits a relation-wise weighting mechanism, and a feature augmentation method that leverages prototypes to adaptively enhance the representations of instances of long-tail relations. We experimentally validate the superiority of pf-RC against competing baselines in various settings, and the results suggest that the tailored techniques mitigate the challenges.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131248216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Profiling and Visualizing Dynamic Pruning Algorithms","authors":"Zhixuan Li, J. Mackenzie","doi":"10.1145/3539618.3591806","DOIUrl":"https://doi.org/10.1145/3539618.3591806","url":null,"abstract":"Efficiently retrieving the top-k documents for a given query is a fundamental operation in many search applications. Dynamic pruning algorithms accelerate top-k retrieval over inverted indexes by skipping documents that are not able to enter the current set of results. However, the performance of these algorithms depends on a number of variables such as the ranking function, the order of documents within the index, and the number of documents to be retrieved. In this paper, we propose a diagnostic framework, Dyno, for profiling and visualizing the performance of dynamic pruning algorithms. Our framework captures processing traces during retrieval, allowing the operations of the index traversal algorithm to be visualized. These visualizations support both query-level and system-to-system comparisons, enabling performance characteristics to be readily understood for different systems. Dyno benefits both academics and practitioners by furthering our understanding of the behavior of dynamic pruning algorithms, allowing better design choices to be made during experimentation and deployment.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131760891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Representation and Labeling Gap Bridging for Cross-lingual Named Entity Recognition","authors":"Xinghua Zhang, Yu Bowen, Jiangxia Cao, Quangang Li, Xuebin Wang, Tingwen Liu, Hongbo Xu","doi":"10.1145/3539618.3591757","DOIUrl":"https://doi.org/10.1145/3539618.3591757","url":null,"abstract":"Cross-lingual Named Entity Recognition (NER) aims to address the challenge of data scarcity in low-resource languages by leveraging knowledge from high-resource languages. Most current work relies on general multilingual language models to represent text, and then uses classic combined tagging (e.g., B-ORG) to annotate entities; However, this approach neglects the lack of cross-lingual alignment of entity representations in language models, and also ignores the fact that entity spans and types have varying levels of labeling difficulty in terms of transferability. To address these challenges, we propose a novel framework, referred to as DLBri, which addresses the issues of representation and labeling simultaneously. Specifically, the proposed framework utilizes progressive contrastive learning with source-to-target oriented sentence pairs to pre-finetune the language model, resulting in improved cross-lingual entity-aware representations. Additionally, a decomposition-then-combination procedure is proposed, which separately transfers entity span and type, and then combines their information, to reduce the difficulty of cross-lingual entity labeling. Extensive experiments on 13 diverse language pairs confirm the effectiveness of DLBri.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133566344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Zhong, Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin
{"title":"One Blade for One Purpose: Advancing Math Information Retrieval using Hybrid Search","authors":"Wei Zhong, Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin","doi":"10.1145/3539618.3591746","DOIUrl":"https://doi.org/10.1145/3539618.3591746","url":null,"abstract":"Neural retrievers have been shown to be effective for math-aware search. Their ability to cope with math symbol mismatches, to represent highly contextualized semantics, and to learn effective representations are critical to improving math information retrieval. However, the most effective retriever for math remains impractical as it depends on token-level dense representations for each math token, which leads to prohibitive storage demands, especially considering that math content generally consumes more tokens. In this work, we try to alleviate this efficiency bottleneck while boosting math information retrieval effectiveness via hybrid search. To this end, we propose MABOWDOR, a Math-Aware Bestof-Worlds Domain Optimized Retriever, which has an unsupervised structure search component, a dense retriever, and optionally a sparse retriever on top of a domain-adapted backbone learned by context-enhanced pretraining, each addressing a different need in retrieving heterogeneous data from math documents. Our hybrid search outperforms the previous state-of-the-art math IR system while eliminating efficiency bottlenecks. Our system is available at https://github.com/approach0/pya0.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127698750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huiyuan Chen, Chin-Chia Michael Yeh, Yujie Fan, Yan Zheng, Junpeng Wang, Vivian Lai, Mahashweta Das, Hao Yang
{"title":"Sharpness-Aware Graph Collaborative Filtering","authors":"Huiyuan Chen, Chin-Chia Michael Yeh, Yujie Fan, Yan Zheng, Junpeng Wang, Vivian Lai, Mahashweta Das, Hao Yang","doi":"10.1145/3539618.3592059","DOIUrl":"https://doi.org/10.1145/3539618.3592059","url":null,"abstract":"Graph Neural Networks (GNNs) have achieved impressive performance in collaborative filtering. However, recent studies show that GNNs tend to yield inferior performance when the distributions of training and test data are not aligned well. Moreover, training GNNs often requires optimizing non-convex neural networks with an abundance of local and global minima, which may differ widely in their performance at test time. Thus, it is essential to develop an optimization strategy that can choose the minima carefully, which can yield strong generalization performance on unseen data. Here we propose an effective training schema, called gSAM, under the principle that theflatter minima has a better generalization ability than thesharper ones. To achieve this goal, gSAM regularizes the flatness of the weight loss landscape by forming a bi-level optimization: the outer problem conducts the standard model training while the inner problem helps the model jump out of the sharp minima. Experimental results show the superiority of our gSAM.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127787709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"OpenMatch-v2: An All-in-one Multi-Modality PLM-based Information Retrieval Toolkit","authors":"Shi Yu, Zhenghao Liu, Chenyan Xiong, Zhiyuan Liu","doi":"10.1145/3539618.3591813","DOIUrl":"https://doi.org/10.1145/3539618.3591813","url":null,"abstract":"Pre-trained language models (PLMs) have emerged as the foundation of the most advanced Information Retrieval (IR) models. Powered by PLMs, the latest IR research has proposed novel models, new domain adaptation algorithms as well as enlarged datasets. In this paper, we present a Python-based IR toolkit OpenMatch-v2. As a full upgrade of OpenMatch proposed in 2021, OpenMatch-v2 incorporates the most recent advancements of PLM-based IR research, providing support for new, cross-modality models and enhanced domain adaptation techniques with a streamlined, optimized infrastructure. The code of OpenMatch is publicly available at https://github.com/OpenMatch/OpenMatch.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128068166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leader-Generator Net: Dividing Skill and Implicitness for Conquering FairytaleQA","authors":"Wei Peng, Wanshui Li, Yue Hu","doi":"10.1145/3539618.3591710","DOIUrl":"https://doi.org/10.1145/3539618.3591710","url":null,"abstract":"Machine reading comprehension requires systems to understand the given passage and answer questions. Previous methods mainly focus on the interaction between the question and passage. However, they ignore the deep exploration of cognitive elements behind questions, such as fine-grained reading skills (this paper focuses on narrative comprehension skills) and implicitness or explicitness of the question (whether the answer can be found in the passage). Grounded in prior literature on reading comprehension, the understanding of a question is a complex process where human beings need to understand the semantics of the question, use different reading skills for different questions, and then judge the implicitness of the question. To this end, a simple but effective Leader-Generator Network is proposed to explicitly separate and extract fine-grained reading skills and the implicitness or explicitness of the question. Specifically, the proposed skill leader accurately captures the semantic representation of fine-grained reading skills with contrastive learning. And the implicitness-aware pointer-generator adaptively extracts or generates the answer based on the implicitness or explicitness of the question. Furthermore, to validate the generalizability of the methodology, we annotate a new dataset named NarrativeQA 1.1. Experiments on the FairytaleQA and NarrativeQA 1.1 show that the proposed model achieves the state-of-the-art performance (about 5% gain on Rouge-L) on the question answering task. Our annotated data and code are available at https://github.com/pengwei-iie/Leader-Generator-Net.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132552977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}