Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval最新文献

筛选
英文 中文
An Effective Framework for Enhancing Query Answering in a Heterogeneous Data Lake 异构数据湖中增强查询应答的有效框架
Qin Yuan, Ye Yuan, Z. Wen, He Wang, Shiyuan Tang
{"title":"An Effective Framework for Enhancing Query Answering in a Heterogeneous Data Lake","authors":"Qin Yuan, Ye Yuan, Z. Wen, He Wang, Shiyuan Tang","doi":"10.1145/3539618.3591637","DOIUrl":"https://doi.org/10.1145/3539618.3591637","url":null,"abstract":"There has been a growing interest in cross-source searching to gain rich knowledge in recent years. A data lake collects massive raw and heterogeneous data with different data schemas and query interfaces. Many real-life applications require query answering over the heterogeneous data lake, such as e-commerce, bioinformatics and healthcare. In this paper, we propose LakeAns that semantically integrates heterogeneous data schemas of the lake to enhance the semantics of query answers. To this end, we propose a novel framework to efficiently and effectively perform the cross-source searching. The framework exploits a reinforcement learning method to semantically integrate the data schemas and further create a global relational schema for the heterogeneous data. It then performs a query answering algorithm based on the global schema to find answers across multiple data sources. We conduct extensive experimental evaluations using real-life data to verify that our approach outperforms existing solutions in terms of effectiveness and efficiency.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133165246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic-enhanced Modality-asymmetric Retrieval for Online E-commerce Search 面向在线电子商务搜索的语义增强模态非对称检索
Zhigong Zhou, Ning Ding, Xiaochuan Fan, Yue Shang, Yiming Qiu, Jingwei Zhuo, Zhiwei Ge, Songlin Wang, Lin Liu, Sulong Xu, Han Zhang
{"title":"Semantic-enhanced Modality-asymmetric Retrieval for Online E-commerce Search","authors":"Zhigong Zhou, Ning Ding, Xiaochuan Fan, Yue Shang, Yiming Qiu, Jingwei Zhuo, Zhiwei Ge, Songlin Wang, Lin Liu, Sulong Xu, Han Zhang","doi":"10.1145/3539618.3591863","DOIUrl":"https://doi.org/10.1145/3539618.3591863","url":null,"abstract":"Semantic retrieval, which retrieves semantically matched items given a textual query, has been an essential component to enhance system effectiveness in e-commerce search. In this paper, we study the multimodal retrieval problem, where the visual information (e.g, image) of item is leveraged as supplementary of textual information to enrich item representation and further improve retrieval performance. Though learning from cross-modality data has been studied extensively in tasks such as visual question answering or media summarization, multimodal retrieval remains a non-trivial and unsolved problem especially in the asymmetric scenario where the query is unimodal while the item is multimodal. In this paper, we propose a novel model named SMAR, which stands for Semantic-enhanced Modality-Asymmetric Retrieval, to tackle the problem of modality fusion and alignment in this kind of asymmetric scenario. Extensive experimental results on an industrial dataset show that the proposed model outperforms baseline models significantly in retrieval accuracy. We have open sourced our industrial dataset for the sake of reproducibility and future research works.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134355207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rating Prediction in Conversational Task Assistants with Behavioral and Conversational-Flow Features 具有行为和会话流特征的会话任务助手的评价预测
Rafael Ferreira, David Semedo, João Magalhães
{"title":"Rating Prediction in Conversational Task Assistants with Behavioral and Conversational-Flow Features","authors":"Rafael Ferreira, David Semedo, João Magalhães","doi":"10.1145/3539618.3592048","DOIUrl":"https://doi.org/10.1145/3539618.3592048","url":null,"abstract":"Predicting the success of Conversational Task Assistants (CTA) can be critical to understand user behavior and act accordingly. In this paper, we propose TB-Rater, a Transformer model which combines conversational-flow features with user behavior features for predicting user ratings in a CTA scenario. In particular, we use real human-agent conversations and ratings collected in the Alexa TaskBot challenge, a novel multimodal and multi-turn conversational context. Our results show the advantages of modeling both the conversational-flow and behavioral aspects of the conversation in a single model for offline rating prediction. Additionally, an analysis of the CTA-specific behavioral features brings insights into this setting and can be used to bootstrap future systems.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115233648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multimodal Named Entity Recognition and Relation Extraction with Retrieval-Augmented Strategy 基于检索增强策略的多模态命名实体识别与关系提取
Xuming Hu
{"title":"Multimodal Named Entity Recognition and Relation Extraction with Retrieval-Augmented Strategy","authors":"Xuming Hu","doi":"10.1145/3539618.3591790","DOIUrl":"https://doi.org/10.1145/3539618.3591790","url":null,"abstract":"Multimodal Named Entity Recognition (MNER) and Multimodal Relation Extraction (MRE) are tasks in information retrieval that aim to recognize entities and extract relations among them using information from multiple modalities, such as text and images. Although current methods have attempted a variety of modality fusion approaches to enhance the information in text, a large amount of readily available internet retrieval data has not been considered. Therefore, we attempt to retrieve real-world text related to images, objects, and entire sentences from the internet and use this retrieved text as input for cross-modal fusion to improve the performance of entity and relation extraction tasks in the text.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114427893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Calibration Learning for Few-shot Novel Product Description 小样本新产品描述的校准学习
Zheng Liu, Mingjing Wu, Bo Peng, Yichao Liu, Qi Peng, Chong Zou
{"title":"Calibration Learning for Few-shot Novel Product Description","authors":"Zheng Liu, Mingjing Wu, Bo Peng, Yichao Liu, Qi Peng, Chong Zou","doi":"10.1145/3539618.3591959","DOIUrl":"https://doi.org/10.1145/3539618.3591959","url":null,"abstract":"In the field of E-commerce, the rapid introduction of new products poses challenges for product description generation. Traditional approaches rely on large labelled datasets, which are often unavailable for novel products with limited data. To address this issue, we propose a calibration learning approach for few-shot novel product description. Our method leverages a small amount of labelled data for calibration and utilizes the novel product's semantic representation as prompts to generate accurate and informative descriptions. We evaluate our approach on three large-scale e-commerce datasets of novel products and demonstrate its effectiveness in significantly improving the quality of generated product descriptions compared to existing methods, especially when only limited data is available. We also conduct the analysis to understand the impact of different modules on the performance.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"698 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123223635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Programming Q&A with Neural Generative Augmentation 用神经生成增强改进编程问答
Suthee Chaidaroon, Xiao Zhang, Shruti Subramaniyam, Jeffrey Svajlenko, Tanya Shourya, I. Keivanloo, Ria Joy
{"title":"Improving Programming Q&A with Neural Generative Augmentation","authors":"Suthee Chaidaroon, Xiao Zhang, Shruti Subramaniyam, Jeffrey Svajlenko, Tanya Shourya, I. Keivanloo, Ria Joy","doi":"10.1145/3539618.3591860","DOIUrl":"https://doi.org/10.1145/3539618.3591860","url":null,"abstract":"Knowledge-intensive programming Q&A is an active research area in industry. Its application boosts developer productivity by aiding developers in quickly finding programming answers from the vast amount of information on the Internet. In this study, we propose ProQANS and its variants ReProQANS and ReAugProQANS to tackle programming Q&A. ProQANS is a neural search approach that leverages unlabeled data on the Internet (such as StackOverflow) to mitigate the cold-start problem. ReProQANS extends ProQANS by utilizing reformulated queries with a novel triplet loss. We further use an auxiliary generative model to augment the training queries, and design a novel dual triplet loss function to adapt these generated queries, to build another variant of ReProQANS termed as ReAugProQANS. In our empirical experiments, we show ReProQANS has the best performance when evaluated on the in-domain test set, while ReAugProQANS has the superior performance on the out-of-domain real programming questions, by outperforming the state-of-the-art model by up to 477% lift on the MRR metric respectively. The results suggest their robustness to previously unseen questions and its wide application to real programming questions.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123572275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AdaMCL: Adaptive Fusion Multi-View Contrastive Learning for Collaborative Filtering 协同过滤的自适应融合多视图对比学习
Guanghui Zhu, Wang Lu, C. Yuan, Y. Huang
{"title":"AdaMCL: Adaptive Fusion Multi-View Contrastive Learning for Collaborative Filtering","authors":"Guanghui Zhu, Wang Lu, C. Yuan, Y. Huang","doi":"10.1145/3539618.3591632","DOIUrl":"https://doi.org/10.1145/3539618.3591632","url":null,"abstract":"Graph collaborative filtering has achieved great success in capturing users' preferences over items. Despite effectiveness, graph neural network (GNN)-based methods suffer from data sparsity in real scenarios. Recently, contrastive learning (CL) has been used to address the problem of data sparsity. However, most CL-based methods only leverage the original user-item interaction graph to construct the CL task, lacking the explicit exploitation of the higher-order information (i.e., user-user and item-item relationships). Even for the CL-based method that uses the higher-order information, the reception field of the higher-order information is fixed and regardless of the difference between nodes. In this paper, we propose a novel adaptive multi-view fusion contrastive learning framework, named AdaMCL, for graph collaborative filtering. To exploit the higher-order information more accurately, we propose an adaptive fusion strategy to fuse the embeddings learned from the user-item and user-user graphs. Moreover, we propose a multi-view fusion contrastive learning paradigm to construct effective CL tasks. Besides, to alleviate the noisy information caused by aggregating higher-order neighbors, we propose a layer-level CL task. Extensive experimental results reveal that AdaMCL is effective and outperforms existing collaborative filtering models significantly.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122042589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
DICE: a Dataset of Italian Crime Event news DICE:意大利犯罪事件新闻的数据集
Giovanni Bonisoli, Maria Pia di Buono, Laura Po, Federica Rollo
{"title":"DICE: a Dataset of Italian Crime Event news","authors":"Giovanni Bonisoli, Maria Pia di Buono, Laura Po, Federica Rollo","doi":"10.1145/3539618.3591904","DOIUrl":"https://doi.org/10.1145/3539618.3591904","url":null,"abstract":"Extracting events from news stories as the aim of several Natural Language Processing (NLP) applications (e.g., question answering, news recommendation, news summarization) is not a trivial task, due to the complexity of natural language and the fact that news reporting is characterized by journalistic style and norms. Those aspects entail scattering an event description over several sentences within one document (or more documents), applying a mechanism of gradual specification of event-related information. This implies a widespread use of co-reference relations among the textual elements, conveying non-linear temporal information. In addition to this, despite the achievement of state-of-the-art results in several tasks, high-quality training datasets for non-English languages are rarely available. This paper presents our preliminary study to develop an annotated Dataset for Italian Crime Event news (DICE). The contribution of the paper are: (1) the creation of a corpus of 10,395 crime news; (2) the annotation schema; (3) a dataset of 10,395 news with automatic annotations; (4) a preliminary manual annotation using the proposed schema of 1000 documents. The first tests on DICE have compared the performance of a manual annotator with that of single-span and multi-span question answering models and shown there is still a gap in the models, especially when dealing with more complex annotation tasks and limited training data. This underscores the importance of investing in the creation of high-quality annotated datasets like DICE, which can provide a solid foundation for training and testing a wide range of NLP models.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125299215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MEME: Multi-Encoder Multi-Expert Framework with Data Augmentation for Video Retrieval 基于数据增强的视频检索多编码器多专家框架
Seong-Min Kang, Yoon-Sik Cho
{"title":"MEME: Multi-Encoder Multi-Expert Framework with Data Augmentation for Video Retrieval","authors":"Seong-Min Kang, Yoon-Sik Cho","doi":"10.1145/3539618.3591726","DOIUrl":"https://doi.org/10.1145/3539618.3591726","url":null,"abstract":"Text-to-video(T2V) retrieval aims to find relevant videos from text queries. The recently introduced Contrastive Language Image Pretraining (CLIP), a pretrained language-vision model trained on large-scale image and caption pairs, has been extensively studied in the literature for this task. Existing studies on T2V task have aimed to transfer the CLIP knowledge and focus on enhancing retrieval performance through fine-grained representation learning. While fine-grained contrast has achieved some remarkable results, less attention has been paid to coarse-grained contrasts. To this end, we propose a method called Graph Patch Spreading (GPS) to aggregate patches across frames at the coarse-grained level. We apply GPS to our proposed framework called Multi-Encoder Multi-Expert (MEME) framework. Our proposed scheme is general enough to be applied to any existing CLIP-based video-text retrieval models. We demonstrate the effectiveness of our method on existing models over the benchmark datasets MSR-VTT, MSVD, and LSMDC datasets. Our code can be found at https://github.com/kang7734/MEME__.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129833726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VoMBaT: A Tool for Visualising Evaluation Measure Behaviour in High-Recall Search Tasks 一个可视化高召回搜索任务中评价测量行为的工具
Wojciech Kusa, Aldo Lipani, Petr Knoth, A. Hanbury
{"title":"VoMBaT: A Tool for Visualising Evaluation Measure Behaviour in High-Recall Search Tasks","authors":"Wojciech Kusa, Aldo Lipani, Petr Knoth, A. Hanbury","doi":"10.1145/3539618.3591802","DOIUrl":"https://doi.org/10.1145/3539618.3591802","url":null,"abstract":"The objective of High-Recall Information Retrieval (HRIR) is to retrieve as many relevant documents as possible for a given search topic. One approach to HRIR is Technology-Assisted Review (TAR), which uses information retrieval and machine learning techniques to aid the review of large document collections. TAR systems are commonly used in legal eDiscovery and systematic literature reviews. Successful TAR systems are able to find the majority of relevant documents using the least number of assessments. Commonly used retrospective evaluation assumes that the system achieves a specific, fixed recall level first, and then measures the precision or work saved (e.g., precision at r% recall). This approach can cause problems related to understanding the behaviour of evaluation measures in a fixed recall setting. It is also problematic when estimating time and money savings during technology-assisted reviews. This paper presents a new visual analytics tool to explore the dynamics of evaluation measures depending on recall level. We implemented 18 evaluation measures based on the confusion matrix terms, both from general IR tasks and specific to TAR. The tool allows for a comparison of the behaviour of these measures in a fixed recall evaluation setting. It can also simulate savings in time and money and a count of manual vs automatic assessments for different datasets depending on the model quality. The tool is open-source, and the demo is available under the following URL: https://vombat.streamlit.app.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129997463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信