arXiv - CS - Information Retrieval最新文献_第2页

Designing Interfaces for Multimodal Vector Search Applications 为多模式矢量搜索应用设计界面

arXiv - CS - Information Retrieval Pub Date : 2024-09-18 DOI: arxiv-2409.11629

Owen Pendrigh Elliott, Tom Hamer, Jesse Clark

引用次数: 0

GenCRF: Generative Clustering and Reformulation Framework for Enhanced Intent-Driven Information Retrieval GenCRF：用于增强型意图驱动信息检索的生成聚类和重构框架

arXiv - CS - Information Retrieval Pub Date : 2024-09-17 DOI: arxiv-2409.10909

Wonduk Seo, Haojie Zhang, Yueyang Zhang, Changhao Zhang, Songyao Duan, Lixin Su, Daiting Shi, Jiashu Zhao, Dawei Yin

{"title":"GenCRF: Generative Clustering and Reformulation Framework for Enhanced Intent-Driven Information Retrieval","authors":"Wonduk Seo, Haojie Zhang, Yueyang Zhang, Changhao Zhang, Songyao Duan, Lixin Su, Daiting Shi, Jiashu Zhao, Dawei Yin","doi":"arxiv-2409.10909","DOIUrl":"https://doi.org/arxiv-2409.10909","url":null,"abstract":"Query reformulation is a well-known problem in Information Retrieval (IR)\u0000aimed at enhancing single search successful completion rate by automatically\u0000modifying user's input query. Recent methods leverage Large Language Models\u0000(LLMs) to improve query reformulation, but often generate limited and redundant\u0000expansions, potentially constraining their effectiveness in capturing diverse\u0000intents. In this paper, we propose GenCRF: a Generative Clustering and\u0000Reformulation Framework to capture diverse intentions adaptively based on\u0000multiple differentiated, well-generated queries in the retrieval phase for the\u0000first time. GenCRF leverages LLMs to generate variable queries from the initial\u0000query using customized prompts, then clusters them into groups to distinctly\u0000represent diverse intents. Furthermore, the framework explores to combine\u0000diverse intents query with innovative weighted aggregation strategies to\u0000optimize retrieval performance and crucially integrates a novel Query\u0000Evaluation Rewarding Model (QERM) to refine the process through feedback loops.\u0000Empirical experiments on the BEIR benchmark demonstrate that GenCRF achieves\u0000state-of-the-art performance, surpassing previous query reformulation SOTAs by\u0000up to 12% on nDCG@10. These techniques can be adapted to various LLMs,\u0000significantly boosting retriever performance and advancing the field of\u0000Information Retrieval.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Best-of-Both Approach to Improve Match Predictions and Reciprocal Recommendations for Job Search 改善求职匹配预测和互惠推荐的最佳方法

arXiv - CS - Information Retrieval Pub Date : 2024-09-17 DOI: arxiv-2409.10992

Shuhei Goda, Yudai Hayashi, Yuta Saito

{"title":"A Best-of-Both Approach to Improve Match Predictions and Reciprocal Recommendations for Job Search","authors":"Shuhei Goda, Yudai Hayashi, Yuta Saito","doi":"arxiv-2409.10992","DOIUrl":"https://doi.org/arxiv-2409.10992","url":null,"abstract":"Matching users with mutual preferences is a critical aspect of services\u0000driven by reciprocal recommendations, such as job search. To produce\u0000recommendations in such scenarios, one can predict match probabilities and\u0000construct rankings based on these predictions. However, this direct match\u0000prediction approach often underperforms due to the extreme sparsity of match\u0000labels. Therefore, most existing methods predict preferences separately for\u0000each direction (e.g., job seeker to employer and employer to job seeker) and\u0000then aggregate the predictions to generate overall matching scores and produce\u0000recommendations. However, this typical approach often leads to practical\u0000issues, such as biased error propagation between the two models. This paper\u0000introduces and demonstrates a novel and practical solution to improve\u0000reciprocal recommendations in production by leveraging textit{pseudo-match\u0000scores}. Specifically, our approach generates dense and more directly relevant\u0000pseudo-match scores by combining the true match labels, which are accurate but\u0000sparse, with relatively inaccurate but dense match predictions. We then train a\u0000meta-model to output the final match predictions by minimizing the prediction\u0000loss against the pseudo-match scores. Our method can be seen as a\u0000textbf{best-of-both (BoB) approach}, as it combines the high-level ideas of\u0000both direct match prediction and the two separate models approach. It also\u0000allows for user-specific weights to construct textit{personalized}\u0000pseudo-match scores, achieving even better matching performance through\u0000appropriate tuning of the weights. Offline experiments on real-world job search\u0000data demonstrate the superior performance of our BoB method, particularly with\u0000personalized pseudo-match scores, compared to existing approaches in terms of\u0000finding potential matches.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"67 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Attention-Seeker: Dynamic Self-Attention Scoring for Unsupervised Keyphrase Extraction Attention-Seeker：无监督关键词提取的动态自我注意力评分

arXiv - CS - Information Retrieval Pub Date : 2024-09-17 DOI: arxiv-2409.10907

Erwin D. López Z., Cheng Tang, Atsushi Shimada

引用次数: 0

Challenging Fairness: A Comprehensive Exploration of Bias in LLM-Based Recommendations 挑战公平：全面探讨基于法律硕士的建议中的偏见

arXiv - CS - Information Retrieval Pub Date : 2024-09-17 DOI: arxiv-2409.10825

Shahnewaz Karim Sakib, Anindya Bijoy Das

{"title":"Challenging Fairness: A Comprehensive Exploration of Bias in LLM-Based Recommendations","authors":"Shahnewaz Karim Sakib, Anindya Bijoy Das","doi":"arxiv-2409.10825","DOIUrl":"https://doi.org/arxiv-2409.10825","url":null,"abstract":"Large Language Model (LLM)-based recommendation systems provide more\u0000comprehensive recommendations than traditional systems by deeply analyzing\u0000content and user behavior. However, these systems often exhibit biases,\u0000favoring mainstream content while marginalizing non-traditional options due to\u0000skewed training data. This study investigates the intricate relationship\u0000between bias and LLM-based recommendation systems, with a focus on music, song,\u0000and book recommendations across diverse demographic and cultural groups.\u0000Through a comprehensive analysis conducted over different LLM-models, this\u0000paper evaluates the impact of bias on recommendation outcomes. Our findings\u0000reveal that bias is so deeply ingrained within these systems that even a\u0000simpler intervention like prompt engineering can significantly reduce bias,\u0000underscoring the pervasive nature of the issue. Moreover, factors like\u0000intersecting identities and contextual information, such as socioeconomic\u0000status, further amplify these biases, demonstrating the complexity and depth of\u0000the challenges faced in creating fair recommendations across different groups.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Fair RAG: On the Impact of Fair Ranking in Retrieval-Augmented Generation 走向公平的 RAG：论检索增强生成中公平排序的影响

arXiv - CS - Information Retrieval Pub Date : 2024-09-17 DOI: arxiv-2409.11598

To Eun Kim, Fernando Diaz

{"title":"Towards Fair RAG: On the Impact of Fair Ranking in Retrieval-Augmented Generation","authors":"To Eun Kim, Fernando Diaz","doi":"arxiv-2409.11598","DOIUrl":"https://doi.org/arxiv-2409.11598","url":null,"abstract":"Many language models now enhance their responses with retrieval capabilities,\u0000leading to the widespread adoption of retrieval-augmented generation (RAG)\u0000systems. However, despite retrieval being a core component of RAG, much of the\u0000research in this area overlooks the extensive body of work on fair ranking,\u0000neglecting the importance of considering all stakeholders involved. This paper\u0000presents the first systematic evaluation of RAG systems integrated with fair\u0000rankings. We focus specifically on measuring the fair exposure of each relevant\u0000item across the rankings utilized by RAG systems (i.e., item-side fairness),\u0000aiming to promote equitable growth for relevant item providers. To gain a deep\u0000understanding of the relationship between item-fairness, ranking quality, and\u0000generation quality in the context of RAG, we analyze nine different RAG systems\u0000that incorporate fair rankings across seven distinct datasets. Our findings\u0000indicate that RAG systems with fair rankings can maintain a high level of\u0000generation quality and, in many cases, even outperform traditional RAG systems,\u0000despite the general trend of a tradeoff between ensuring fairness and\u0000maintaining system-effectiveness. We believe our insights lay the groundwork\u0000for responsible and equitable RAG systems and open new avenues for future\u0000research. We publicly release our codebase and dataset at\u0000https://github.com/kimdanny/Fair-RAG.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-modal Generative Models in Recommendation System 推荐系统中的多模式生成模型

arXiv - CS - Information Retrieval Pub Date : 2024-09-17 DOI: arxiv-2409.10993

Arnau Ramisa, Rene Vidal, Yashar Deldjoo, Zhankui He, Julian McAuley, Anton Korikov, Scott Sanner, Mahesh Sathiamoorthy, Atoosa Kasrizadeh, Silvia Milano, Francesco Ricci

{"title":"Multi-modal Generative Models in Recommendation System","authors":"Arnau Ramisa, Rene Vidal, Yashar Deldjoo, Zhankui He, Julian McAuley, Anton Korikov, Scott Sanner, Mahesh Sathiamoorthy, Atoosa Kasrizadeh, Silvia Milano, Francesco Ricci","doi":"arxiv-2409.10993","DOIUrl":"https://doi.org/arxiv-2409.10993","url":null,"abstract":"Many recommendation systems limit user inputs to text strings or behavior\u0000signals such as clicks and purchases, and system outputs to a list of products\u0000sorted by relevance. With the advent of generative AI, users have come to\u0000expect richer levels of interactions. In visual search, for example, a user may\u0000provide a picture of their desired product along with a natural language\u0000modification of the content of the picture (e.g., a dress like the one shown in\u0000the picture but in red color). Moreover, users may want to better understand\u0000the recommendations they receive by visualizing how the product fits their use\u0000case, e.g., with a representation of how a garment might look on them, or how a\u0000furniture item might look in their room. Such advanced levels of interaction\u0000require recommendation systems that are able to discover both shared and\u0000complementary information about the product across modalities, and visualize\u0000the product in a realistic and informative way. However, existing systems often\u0000treat multiple modalities independently: text search is usually done by\u0000comparing the user query to product titles and descriptions, while visual\u0000search is typically done by comparing an image provided by the customer to\u0000product images. We argue that future recommendation systems will benefit from a\u0000multi-modal understanding of the products that leverages the rich information\u0000retailers have about both customers and products to come up with the best\u0000recommendations. In this chapter we review recommendation systems that use\u0000multiple data modalities simultaneously.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models Promptriever：经过指令训练的检索器可以像语言模型一样接受提示

arXiv - CS - Information Retrieval Pub Date : 2024-09-17 DOI: arxiv-2409.11136

Orion Weller, Benjamin Van Durme, Dawn Lawrie, Ashwin Paranjape, Yuhao Zhang, Jack Hessel

{"title":"Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models","authors":"Orion Weller, Benjamin Van Durme, Dawn Lawrie, Ashwin Paranjape, Yuhao Zhang, Jack Hessel","doi":"arxiv-2409.11136","DOIUrl":"https://doi.org/arxiv-2409.11136","url":null,"abstract":"Instruction-tuned language models (LM) are able to respond to imperative\u0000commands, providing a more natural user interface compared to their base\u0000counterparts. In this work, we present Promptriever, the first retrieval model\u0000able to be prompted like an LM. To train Promptriever, we curate and release a\u0000new instance-level instruction training set from MS MARCO, spanning nearly 500k\u0000instances. Promptriever not only achieves strong performance on standard\u0000retrieval tasks, but also follows instructions. We observe: (1) large gains\u0000(reaching SoTA) on following detailed relevance instructions (+14.3 p-MRR /\u0000+3.1 nDCG on FollowIR), (2) significantly increased robustness to lexical\u0000choices/phrasing in the query+instruction (+12.9 Robustness@10 on InstructIR),\u0000and (3) the ability to perform hyperparameter search via prompting to reliably\u0000improve retrieval performance (+1.4 average increase on BEIR). Promptriever\u0000demonstrates that retrieval models can be controlled with prompts on a\u0000per-query basis, setting the stage for future work aligning LM prompting\u0000techniques with information retrieval.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Beyond Relevance: Improving User Engagement by Personalization for Short-Video Search 超越相关性：通过个性化短视频搜索提高用户参与度

arXiv - CS - Information Retrieval Pub Date : 2024-09-17 DOI: arxiv-2409.11281

Wentian Bao, Hu Liu, Kai Zheng, Chao Zhang, Shunyu Zhang, Enyun Yu, Wenwu Ou, Yang Song

{"title":"Beyond Relevance: Improving User Engagement by Personalization for Short-Video Search","authors":"Wentian Bao, Hu Liu, Kai Zheng, Chao Zhang, Shunyu Zhang, Enyun Yu, Wenwu Ou, Yang Song","doi":"arxiv-2409.11281","DOIUrl":"https://doi.org/arxiv-2409.11281","url":null,"abstract":"Personalized search has been extensively studied in various applications,\u0000including web search, e-commerce, social networks, etc. With the soaring\u0000popularity of short-video platforms, exemplified by TikTok and Kuaishou, the\u0000question arises: can personalization elevate the realm of short-video search,\u0000and if so, which techniques hold the key? In this work, we introduce $text{PR}^2$, a novel and comprehensive solution\u0000for personalizing short-video search, where $text{PR}^2$ stands for the\u0000Personalized Retrieval and Ranking augmented search system. Specifically,\u0000$text{PR}^2$ leverages query-relevant collaborative filtering and personalized\u0000dense retrieval to extract relevant and individually tailored content from a\u0000large-scale video corpus. Furthermore, it utilizes the QIN (Query-Dominate User\u0000Interest Network) ranking model, to effectively harness user long-term\u0000preferences and real-time behaviors, and efficiently learn from user various\u0000implicit feedback through a multi-task learning framework. By deploying the\u0000$text{PR}^2$ in production system, we have achieved the most remarkable user\u0000engagement improvements in recent years: a 10.2% increase in CTR@10, a notable\u000020% surge in video watch time, and a 1.6% uplift of search DAU. We believe the\u0000practical insights presented in this work are valuable especially for building\u0000and improving personalized search systems for the short video platforms.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Framework for Ranking Content Providers Using Prompt Engineering and Self-Attention Network 利用提示工程和自我关注网络对内容提供商进行排名的框架

arXiv - CS - Information Retrieval Pub Date : 2024-09-17 DOI: arxiv-2409.11511

Gosuddin Kamaruddin Siddiqi, Deven Santhosh Shah, Radhika Bansal, Askar Kamalov

引用次数: 0