Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval最新文献

筛选
英文 中文
EDIndex: Enabling Fast Data Queries in Edge Storage Systems EDIndex:在边缘存储系统中实现快速数据查询
Qiang He, Siyu Tan, Feifei Chen, Xiaolong Xu, Lianyong Qi, X. Hei, Hai Jin, Yun Yang
{"title":"EDIndex: Enabling Fast Data Queries in Edge Storage Systems","authors":"Qiang He, Siyu Tan, Feifei Chen, Xiaolong Xu, Lianyong Qi, X. Hei, Hai Jin, Yun Yang","doi":"10.1145/3539618.3591676","DOIUrl":"https://doi.org/10.1145/3539618.3591676","url":null,"abstract":"In an edge storage system, popular data can be stored on edge servers to enable low-latency data retrieval for nearby users. Suffering from constrained storage capacities, edge servers must process users' data requests collaboratively. For sourcing data, it is essential to find out which edge servers in the system have the requested data. In this paper, we make the first attempt to study this edge data query (EDQ) problem and present EDIndex, a distributed Edge Data Indexing system to enable fast data queries at the edge. First, we introduce a new index structure named Counting Bloom Filter (CBF) tree for facilitating edge data queries. Then, to improve query performance, we enhance EDIndex with a novel index structure named hierarchical Counting Bloom Filter (HCBF) tree. In EDIndex, each edge server maintains an HCBF tree that indexes the data stored on nearby edge servers to facilitate data sourcing between edge servers at the edge. The results of extensive experiments conducted on an edge storage system comprised of 90 edge servers demonstrate that EDIndex 1) takes up to 8.8x less time to answer edge data queries compared with state-of-the-art edge indexing systems; and 2) can be implemented in practice with a high query accuracy at low initialization and maintenance overheads.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116897555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Alleviating Matching Bias in Marketing Recommendations 缓解营销推荐中的匹配偏差
Junpeng Fang, Qing Cui, Gongduo Zhang, Caizhi Tang, Lihong Gu, Longfei Li, Jinjie Gu, Jun Zhou, Fei Wu
{"title":"Alleviating Matching Bias in Marketing Recommendations","authors":"Junpeng Fang, Qing Cui, Gongduo Zhang, Caizhi Tang, Lihong Gu, Longfei Li, Jinjie Gu, Jun Zhou, Fei Wu","doi":"10.1145/3539618.3591854","DOIUrl":"https://doi.org/10.1145/3539618.3591854","url":null,"abstract":"In marketing recommendations, the campaign organizers will distribute coupons to users to encourage consumption. In general, a series of strategies are employed to interfere with the coupon distribution process, leading to a growing imbalance between user-coupon interactions, resulting in a bias in the estimation of conversion probabilities. We refer to the estimation bias as the matching bias. In this paper, we explore how to alleviate the matching bias from the causal-effect perspective. We regard the historical distributions of users and coupons over each other as confounders and characterize the matching bias as a confounding effect to reveal and eliminate the spurious correlations between user-coupon representations and conversion probabilities. Then we propose a new training paradigm named De-Matching Bias Recommendation (DMBR) to remove the confounding effects during model training via the backdoor adjustment. We instantiate DMBR on two representative models: DNN and MMOE, and conduct extensive offline and online experiments to demonstrate the effectiveness of our proposed paradigm.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116654563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Can Generative LLMs Create Query Variants for Test Collections? An Exploratory Study 生成法学硕士可以为测试集合创建查询变体吗?一项探索性研究
An Exploratory Study, L. Gallagher, Marwah Alaofi, M. Sanderson, Falk Scholer
{"title":"Can Generative LLMs Create Query Variants for Test Collections? An Exploratory Study","authors":"An Exploratory Study, L. Gallagher, Marwah Alaofi, M. Sanderson, Falk Scholer","doi":"10.1145/3539618.3591960","DOIUrl":"https://doi.org/10.1145/3539618.3591960","url":null,"abstract":"This paper explores the utility of a Large Language Model (LLM) to automatically generate queries and query variants from a description of an information need. Given a set of information needs described as backstories, we explore how similar the queries generated by the LLM are to those generated by humans. We quantify the similarity using different metrics and examine how the use of each set would contribute to document pooling when building test collections. Our results show potential in using LLMs to generate query variants. While they may not fully capture the wide variety of human-generated variants, they generate similar sets of relevant documents, reaching up to 71.1% overlap at a pool depth of 100.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116041744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Aligning Distillation For Cold-start Item Recommendation 冷启动校准蒸馏项目推荐
Feiran Huang, Zefan Wang, Xiao Huang, Yu-hong Qian, Zhetao Li, Hao Chen
{"title":"Aligning Distillation For Cold-start Item Recommendation","authors":"Feiran Huang, Zefan Wang, Xiao Huang, Yu-hong Qian, Zhetao Li, Hao Chen","doi":"10.1145/3539618.3591732","DOIUrl":"https://doi.org/10.1145/3539618.3591732","url":null,"abstract":"Recommending cold items in recommendation systems is a longstanding challenge due to the inherent differences between warm items, which are recommended based on user behavior, and cold items, which are recommended based on content features. To tackle this, generative models generate synthetic embeddings from content features, while dropout models enhance the robustness of the recommendation system by randomly dropping behavioral embeddings during training. However, these models primarily focus on handling the recommendation of cold items, but do not effectively address the differences between warm and cold recommendations. As a result, generative models may over-recommend either warm or cold items, neglecting the other type, and dropout models may negatively impact warm item recommendations. To address this, we propose the Aligning Distillation (ALDI) framework, which leverages warm items as \"teachers\" to transfer their behavioral information to cold items, referred to as \"students\". ALDI aligns the students with the teachers by comparing the differences in their recommendation characters, using tailored rating distribution aligning, ranking aligning, and identification aligning losses to narrow these differences. Furthermore, ALDI incorporates a teacher-qualifying weighting structure to prevent students from learning inaccurate information from unreliable teachers. Experiments on three datasets show that our approach outperforms state-of-the-art baselines in terms of overall, warm, and cold recommendation performance with three different recommendation backbones.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114710602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Model-Agnostic Popularity Debias Training Framework for Click-Through Rate Prediction in Recommender System 基于模型不可知的推荐系统点击率预测人气偏差训练框架
Fan Zhang, Qijie Shen
{"title":"A Model-Agnostic Popularity Debias Training Framework for Click-Through Rate Prediction in Recommender System","authors":"Fan Zhang, Qijie Shen","doi":"10.1145/3539618.3591939","DOIUrl":"https://doi.org/10.1145/3539618.3591939","url":null,"abstract":"Recommender system (RS) is widely applied in a multitude of scenarios to aid individuals obtaining the information they require efficiently. At the same time, the prevalence of popularity bias in such systems has become a widely acknowledged issue. To address this challenge, we propose a novel method named Model-Agnostic Popularity Debias Training Framework (MDTF). It consists of two basic modules including 1) General Ranking Model (GRM), which is model-agnostic and can be implemented as any ranking models; and 2) Popularity Debias Module (PDM), which estimates the impact of the competitiveness and popularity of candidate items on the CTR, by utilizing the feedback of cold-start users to re-weigh the loss in GRM. MDTF seamlessly integrates these two modules in an end-to-end multi-task learning framework. Extensive experiments on both real-world offline dataset and online A/B test demonstrate its superiority over state-of-the-art methods.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128178776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Market Product-Related Question Answering 跨市场产品相关问题解答
Negin Ghasemi, Mohammad Aliannejadi, Hamed Bonab, E. Kanoulas, Arjen P. de Vries, J. Allan, D. Hiemstra
{"title":"Cross-Market Product-Related Question Answering","authors":"Negin Ghasemi, Mohammad Aliannejadi, Hamed Bonab, E. Kanoulas, Arjen P. de Vries, J. Allan, D. Hiemstra","doi":"10.1145/3539618.3591658","DOIUrl":"https://doi.org/10.1145/3539618.3591658","url":null,"abstract":"Online shops such as Amazon, eBay, and Etsy continue to expand their presence in multiple countries, creating new resource-scarce marketplaces with thousands of items. We consider a marketplace to be resource-scarce when only limited user-generated data is available about the products (e.g., ratings, reviews, and product-related questions). In such a marketplace, an information retrieval system is less likely to help users find answers to their questions about the products. As a result, questions posted online may go unanswered for extended periods. This study investigates the impact of using available data in a resource-rich marketplace to answer new questions in a resource-scarce marketplace, a new problem we call cross-market question answering. To study this problem's potential impact, we collect and annotate a new dataset, XMarket-QA, from Amazon's UK (resource-scarce) and US (resource-rich) local marketplaces. We conduct a data analysis to understand the scope of the cross-market question-answering task. This analysis shows a temporal gap of almost one year between the first question answered in the UK marketplace and the US marketplace. Also, it shows that the first question about a product is posted in the UK marketplace only when 28 questions, on average, have already been answered about the same product in the US marketplace. Human annotations demonstrate that, on average, 65% of the questions in the UK marketplace can be answered within the US marketplace, supporting the concept of cross-market question answering. Inspired by these findings, we develop a new method, CMJim, which utilizes product similarities across marketplaces in the training phase for retrieving answers from the resource-rich marketplace that can be used to answer a question in the resource-scarce marketplace. Our evaluations show CMJim's significant improvement compared to competitive baselines.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124797822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MetroScope: An Advanced System for Real-Time Detection and Analysis of Metro-Related Threats and Events via Twitter MetroScope:通过Twitter实时检测和分析地铁相关威胁和事件的先进系统
Jianfeng He, Syuan-Ying Wu, Abdulaziz Alhamadani, Chih-Fang Chen, Wen-Fang Lu, Chang-Tien Lu, David Solnick, Yanlin Li
{"title":"MetroScope: An Advanced System for Real-Time Detection and Analysis of Metro-Related Threats and Events via Twitter","authors":"Jianfeng He, Syuan-Ying Wu, Abdulaziz Alhamadani, Chih-Fang Chen, Wen-Fang Lu, Chang-Tien Lu, David Solnick, Yanlin Li","doi":"10.1145/3539618.3591807","DOIUrl":"https://doi.org/10.1145/3539618.3591807","url":null,"abstract":"Metro systems are vital to our daily lives, but they face safety or reliability challenges, such as criminal activities or infrastructure disruptions, respectively. Real-time threat detection and analysis are crucial to ensure their safety and reliability. Although many existing systems use Twitter to detect metro-related threats or events in real-time, they have limitations in event analysis and system maintenance. Specifically, they cannot analyze event development, or prioritize events from numerous tweets. Besides, their users are required to continuously monitor system notifications, use inefficient content retrieval methods, and perform detailed system maintenance. We addressed those issues by developing the MetroScope system, a real-time threat/event detection system applied to Washington D.C. metro system. MetroScope can automatically analyze event development, prioritize events based on urgency, send emergency notifications via emails, provide efficient content retrieval, and self-maintain the system. Our MetroScope system is now available at http://orion.nvc.cs.vt.edu:5000/, with a video (https://www.youtube.com/watch?v=vKIK9M60-J8) introducing its features and instructions. MetroScope is a significant advancement in enhancing the safety and reliability of metro systems.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121754796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
TIB AV-Analytics: A Web-based Platform for Scholarly Video Analysis and Film Studies TIB AV-Analytics:一个学术视频分析和电影研究的网络平台
Matthias Springstein, Markos Stamatakis, Margret Plank, Julian Sittel, Roman Mauer, Oksana Bulgakowa, R. Ewerth, Eric Müller-Budack
{"title":"TIB AV-Analytics: A Web-based Platform for Scholarly Video Analysis and Film Studies","authors":"Matthias Springstein, Markos Stamatakis, Margret Plank, Julian Sittel, Roman Mauer, Oksana Bulgakowa, R. Ewerth, Eric Müller-Budack","doi":"10.1145/3539618.3591820","DOIUrl":"https://doi.org/10.1145/3539618.3591820","url":null,"abstract":"Video analysis platforms that integrate automatic solutions for multimedia and information retrieval enable various applications in many disciplines including film and media studies, communication science, and education. However, current platforms for video analysis either focus on manual annotations or include only a few tools for automatic content analysis. In this paper, we present a novel web-based video analysis platform called TIB AV-Analytics (TIB-AV-A). Unlike previous platforms, TIB-AV-A integrates state-of-the-art approaches in the fields of computer vision, audio analysis, and natural language processing for many relevant video analysis tasks. To facilitate future extensions and to ensure interoperability with existing tools, the video analysis approaches are implemented in a plugin structure with appropriate interfaces and import-export functions. TIB-AV-A leverages modern web technologies to provide users with a responsive and interactive web interface that enables manual annotation and provides access to powerful deep learning tools without a requirement for specific hardware dependencies. Source code and demo are publicly available at: https://service.tib.eu/tibava.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122441190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MaxSimE: Explaining Transformer-based Semantic Similarity via Contextualized Best Matching Token Pairs MaxSimE:通过上下文化的最佳匹配令牌对解释基于转换器的语义相似性
E. Brito, Henri Iser
{"title":"MaxSimE: Explaining Transformer-based Semantic Similarity via Contextualized Best Matching Token Pairs","authors":"E. Brito, Henri Iser","doi":"10.1145/3539618.3592017","DOIUrl":"https://doi.org/10.1145/3539618.3592017","url":null,"abstract":"Current semantic search approaches rely on black-box language models, such as BERT, which limit their interpretability and transparency. In this work, we propose MaxSimE, an explanation method for language models applied to measure semantic similarity. Our approach is inspired by the explainable-by-design ColBERT architecture and generates explanations by matching contextualized query tokens to the most similar tokens from the retrieved document according to the cosine similarity of their embeddings. Unlike existing post-hoc explanation methods, which may lack fidelity to the model and thus fail to provide trustworthy explanations in critical settings, we demonstrate that MaxSimE can generate faithful explanations under certain conditions and how it improves the interpretability of semantic search results on ranked documents from the LoTTe benchmark, showing its potential for trustworthy information retrieval.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"39 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122727388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data-Aware Proxy Hashing for Cross-modal Retrieval 跨模态检索的数据感知代理哈希
Rong-Cheng Tu, Xian-Ling Mao, Wenjin Ji, Wei Wei, Heyan Huang
{"title":"Data-Aware Proxy Hashing for Cross-modal Retrieval","authors":"Rong-Cheng Tu, Xian-Ling Mao, Wenjin Ji, Wei Wei, Heyan Huang","doi":"10.1145/3539618.3591660","DOIUrl":"https://doi.org/10.1145/3539618.3591660","url":null,"abstract":"Recently, numerous proxy hash code based methods, which sufficiently exploit the label information of data to supervise the training of hashing models, have been proposed. Although these methods have made impressive progress, their generating processes of proxy hash codes are based only on the class information of the dataset or labels of data but do not take the data themselves into account. Therefore, these methods will probably generate some inappropriate proxy hash codes, thus damaging the retrieval performance of the hash models. To solve the aforementioned problem, we propose a novel Data-Aware Proxy Hashing for cross-modal retrieval, called DAPH. Specifically, our proposed method first train a data-aware proxy network that takes the data points, label vectors of data, and the class vectors of the dataset as inputs to generate class-based data-aware proxy hash codes, label-fused image-aware proxy hash codes and label-fused text-aware proxy hash codes. Then, we propose a novel hash loss that exploits the three types of data-aware proxy hash codes to supervise the training of modality-specific hashing networks. After training, DAPH is able to generate discriminate hash codes with the semantic information preserved adequately. Extensive experiments on three benchmark datasets show that the proposed DAPH outperforms the state-of-the-art baselines in cross-modal retrieval tasks.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124947959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信