Najmeh Forouzandehmehr, Nima Farrokhsiar, Ramin Giahi, Evren Korpeoglu, Kannan Achan
{"title":"Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference","authors":"Najmeh Forouzandehmehr, Nima Farrokhsiar, Ramin Giahi, Evren Korpeoglu, Kannan Achan","doi":"arxiv-2409.12150","DOIUrl":"https://doi.org/arxiv-2409.12150","url":null,"abstract":"Personalized outfit recommendation remains a complex challenge, demanding\u0000both fashion compatibility understanding and trend awareness. This paper\u0000presents a novel framework that harnesses the expressive power of large\u0000language models (LLMs) for this task, mitigating their \"black box\" and static\u0000nature through fine-tuning and direct feedback integration. We bridge the item\u0000visual-textual gap in items descriptions by employing image captioning with a\u0000Multimodal Large Language Model (MLLM). This enables the LLM to extract style\u0000and color characteristics from human-curated fashion images, forming the basis\u0000for personalized recommendations. The LLM is efficiently fine-tuned on the\u0000open-source Polyvore dataset of curated fashion images, optimizing its ability\u0000to recommend stylish outfits. A direct preference mechanism using negative\u0000examples is employed to enhance the LLM's decision-making process. This creates\u0000a self-enhancing AI feedback loop that continuously refines recommendations in\u0000line with seasonal fashion trends. Our framework is evaluated on the Polyvore\u0000dataset, demonstrating its effectiveness in two key tasks: fill-in-the-blank,\u0000and complementary item retrieval. These evaluations underline the framework's\u0000ability to generate stylish, trend-aligned outfit suggestions, continuously\u0000improving through direct feedback. The evaluation results demonstrated that our\u0000proposed framework significantly outperforms the base LLM, creating more\u0000cohesive outfits. The improved performance in these tasks underscores the\u0000proposed framework's potential to enhance the shopping experience with accurate\u0000suggestions, proving its effectiveness over the vanilla LLM based outfit\u0000generation.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rajaa El Hamdani, Thomas Bonald, Fragkiskos Malliaros, Nils Holzenberger, Fabian Suchanek
{"title":"The Factuality of Large Language Models in the Legal Domain","authors":"Rajaa El Hamdani, Thomas Bonald, Fragkiskos Malliaros, Nils Holzenberger, Fabian Suchanek","doi":"arxiv-2409.11798","DOIUrl":"https://doi.org/arxiv-2409.11798","url":null,"abstract":"This paper investigates the factuality of large language models (LLMs) as\u0000knowledge bases in the legal domain, in a realistic usage scenario: we allow\u0000for acceptable variations in the answer, and let the model abstain from\u0000answering when uncertain. First, we design a dataset of diverse factual\u0000questions about case law and legislation. We then use the dataset to evaluate\u0000several LLMs under different evaluation methods, including exact, alias, and\u0000fuzzy matching. Our results show that the performance improves significantly\u0000under the alias and fuzzy matching methods. Further, we explore the impact of\u0000abstaining and in-context examples, finding that both strategies enhance\u0000precision. Finally, we demonstrate that additional pre-training on legal\u0000documents, as seen with SaulLM, further improves factual precision from 63% to\u000081%.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yifan Sun, Rang Liu, Zhiping Lu, Honghao Luo, Ming Li, Qian Liu
{"title":"Active Reconfigurable Intelligent Surface Empowered Synthetic Aperture Radar Imaging","authors":"Yifan Sun, Rang Liu, Zhiping Lu, Honghao Luo, Ming Li, Qian Liu","doi":"arxiv-2409.11728","DOIUrl":"https://doi.org/arxiv-2409.11728","url":null,"abstract":"Synthetic Aperture Radar (SAR) utilizes the movement of the radar antenna\u0000over a specific area of interest to achieve higher spatial resolution imaging.\u0000In this paper, we aim to investigate the realization of SAR imaging for a\u0000stationary radar system with the assistance of active reconfigurable\u0000intelligent surface (ARIS) mounted on an unmanned aerial vehicle (UAV). As the\u0000UAV moves along the stationary trajectory, the ARIS can not only build a\u0000high-quality virtual line-of-sight (LoS) propagation path, but its mobility can\u0000also effectively create a much larger virtual aperture, which can be utilized\u0000to realize a SAR system. In this paper, we first present a range-Doppler (RD)\u0000imaging algorithm to obtain imaging results for the proposed ARIS-empowered SAR\u0000system. Then, to further improve the SAR imaging performance, we attempt to\u0000optimize the reflection coefficients of ARIS to maximize the signal-to-noise\u0000ratio (SNR) at the stationary radar receiver under the constraints of ARIS\u0000maximum power and amplification factor. An effective algorithm based on\u0000fractional programming (FP) and majorization minimization (MM) methods is\u0000developed to solve the resulting non-convex problem. Simulation results\u0000validate the effectiveness of ARIS-assisted SAR imaging and our proposed RD\u0000imaging and ARIS optimization algorithms.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuening Zhou, Yulin Wang, Qian Cui, Xinyu Guan, Francisco Cisternas
{"title":"Basket-Enhanced Heterogenous Hypergraph for Price-Sensitive Next Basket Recommendation","authors":"Yuening Zhou, Yulin Wang, Qian Cui, Xinyu Guan, Francisco Cisternas","doi":"arxiv-2409.11695","DOIUrl":"https://doi.org/arxiv-2409.11695","url":null,"abstract":"Next Basket Recommendation (NBR) is a new type of recommender system that\u0000predicts combinations of items users are likely to purchase together. Existing\u0000NBR models often overlook a crucial factor, which is price, and do not fully\u0000capture item-basket-user interactions. To address these limitations, we propose\u0000a novel method called Basket-augmented Dynamic Heterogeneous Hypergraph (BDHH).\u0000BDHH utilizes a heterogeneous multi-relational graph to capture the intricate\u0000relationships among item features, with price as a critical factor. Moreover,\u0000our approach includes a basket-guided dynamic augmentation network that could\u0000dynamically enhances item-basket-user interactions. Experiments on real-world\u0000datasets demonstrate that BDHH significantly improves recommendation accuracy,\u0000providing a more comprehensive understanding of user behavior.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kasra Hosseini, Thomas Kober, Josip Krapac, Roland Vollgraf, Weiwei Cheng, Ana Peleteiro Ramallo
{"title":"Retrieve, Annotate, Evaluate, Repeat: Leveraging Multimodal LLMs for Large-Scale Product Retrieval Evaluation","authors":"Kasra Hosseini, Thomas Kober, Josip Krapac, Roland Vollgraf, Weiwei Cheng, Ana Peleteiro Ramallo","doi":"arxiv-2409.11860","DOIUrl":"https://doi.org/arxiv-2409.11860","url":null,"abstract":"Evaluating production-level retrieval systems at scale is a crucial yet\u0000challenging task due to the limited availability of a large pool of\u0000well-trained human annotators. Large Language Models (LLMs) have the potential\u0000to address this scaling issue and offer a viable alternative to humans for the\u0000bulk of annotation tasks. In this paper, we propose a framework for assessing\u0000the product search engines in a large-scale e-commerce setting, leveraging\u0000Multimodal LLMs for (i) generating tailored annotation guidelines for\u0000individual queries, and (ii) conducting the subsequent annotation task. Our\u0000method, validated through deployment on a large e-commerce platform,\u0000demonstrates comparable quality to human annotations, significantly reduces\u0000time and cost, facilitates rapid problem discovery, and provides an effective\u0000solution for production-level quality control at scale.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FLARE: Fusing Language Models and Collaborative Architectures for Recommender Enhancement","authors":"Liam Hebert, Marialena Kyriakidi, Hubert Pham, Krishna Sayana, James Pine, Sukhdeep Sodhi, Ambarish Jash","doi":"arxiv-2409.11699","DOIUrl":"https://doi.org/arxiv-2409.11699","url":null,"abstract":"Hybrid recommender systems, combining item IDs and textual descriptions,\u0000offer potential for improved accuracy. However, previous work has largely\u0000focused on smaller datasets and model architectures. This paper introduces\u0000Flare (Fusing Language models and collaborative Architectures for Recommender\u0000Enhancement), a novel hybrid recommender that integrates a language model (mT5)\u0000with a collaborative filtering model (Bert4Rec) using a Perceiver network. This\u0000architecture allows Flare to effectively combine collaborative and content\u0000information for enhanced recommendations. We conduct a two-stage evaluation, first assessing Flare's performance\u0000against established baselines on smaller datasets, where it demonstrates\u0000competitive accuracy. Subsequently, we evaluate Flare on a larger, more\u0000realistic dataset with a significantly larger item vocabulary, introducing new\u0000baselines for this setting. Finally, we showcase Flare's inherent ability to\u0000support critiquing, enabling users to provide feedback and refine\u0000recommendations. We further leverage critiquing as an evaluation method to\u0000assess the model's language understanding and its transferability to the\u0000recommendation task.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding the Effects of the Baidu-ULTR Logging Policy on Two-Tower Models","authors":"Morris de Haan, Philipp Hager","doi":"arxiv-2409.12043","DOIUrl":"https://doi.org/arxiv-2409.12043","url":null,"abstract":"Despite the popularity of the two-tower model for unbiased learning to rank\u0000(ULTR) tasks, recent work suggests that it suffers from a major limitation that\u0000could lead to its collapse in industry applications: the problem of logging\u0000policy confounding. Several potential solutions have even been proposed;\u0000however, the evaluation of these methods was mostly conducted using\u0000semi-synthetic simulation experiments. This paper bridges the gap between\u0000theory and practice by investigating the confounding problem on the largest\u0000real-world dataset, Baidu-ULTR. Our main contributions are threefold: 1) we\u0000show that the conditions for the confounding problem are given on Baidu-ULTR,\u00002) the confounding problem bears no significant effect on the two-tower model,\u0000and 3) we point to a potential mismatch between expert annotations, the golden\u0000standard in ULTR, and user click behavior.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peng Liu, Jiawei Zhu, Cong Xu, Ming Zhao, Bin Wang
{"title":"An Enhanced-State Reinforcement Learning Algorithm for Multi-Task Fusion in Large-Scale Recommender Systems","authors":"Peng Liu, Jiawei Zhu, Cong Xu, Ming Zhao, Bin Wang","doi":"arxiv-2409.11678","DOIUrl":"https://doi.org/arxiv-2409.11678","url":null,"abstract":"As the last key stage of Recommender Systems (RSs), Multi-Task Fusion (MTF)\u0000is in charge of combining multiple scores predicted by Multi-Task Learning\u0000(MTL) into a final score to maximize user satisfaction, which decides the\u0000ultimate recommendation results. In recent years, to maximize long-term user\u0000satisfaction within a recommendation session, Reinforcement Learning (RL) is\u0000widely used for MTF in large-scale RSs. However, limited by their modeling\u0000pattern, all the current RL-MTF methods can only utilize user features as the\u0000state to generate actions for each user, but unable to make use of item\u0000features and other valuable features, which leads to suboptimal results.\u0000Addressing this problem is a challenge that requires breaking through the\u0000current modeling pattern of RL-MTF. To solve this problem, we propose a novel\u0000method called Enhanced-State RL for MTF in RSs. Unlike the existing methods\u0000mentioned above, our method first defines user features, item features, and\u0000other valuable features collectively as the enhanced state; then proposes a\u0000novel actor and critic learning process to utilize the enhanced state to make\u0000much better action for each user-item pair. To the best of our knowledge, this\u0000novel modeling pattern is being proposed for the first time in the field of\u0000RL-MTF. We conduct extensive offline and online experiments in a large-scale\u0000RS. The results demonstrate that our model outperforms other models\u0000significantly. Enhanced-State RL has been fully deployed in our RS more than\u0000half a year, improving +3.84% user valid consumption and +0.58% user duration\u0000time compared to baseline.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LLM-Powered Text Simulation Attack Against ID-Free Recommender Systems","authors":"Zongwei Wang, Min Gao, Junliang Yu, Xinyi Gao, Quoc Viet Hung Nguyen, Shazia Sadiq, Hongzhi Yin","doi":"arxiv-2409.11690","DOIUrl":"https://doi.org/arxiv-2409.11690","url":null,"abstract":"The ID-free recommendation paradigm has been proposed to address the\u0000limitation that traditional recommender systems struggle to model cold-start\u0000users or items with new IDs. Despite its effectiveness, this study uncovers\u0000that ID-free recommender systems are vulnerable to the proposed Text Simulation\u0000attack (TextSimu) which aims to promote specific target items. As a novel type\u0000of text poisoning attack, TextSimu exploits large language models (LLM) to\u0000alter the textual information of target items by simulating the characteristics\u0000of popular items. It operates effectively in both black-box and white-box\u0000settings, utilizing two key components: a unified popularity extraction module,\u0000which captures the essential characteristics of popular items, and an N-persona\u0000consistency simulation strategy, which creates multiple personas to\u0000collaboratively synthesize refined promotional textual descriptions for target\u0000items by simulating the popular items. To withstand TextSimu-like attacks, we\u0000further explore the detection approach for identifying LLM-generated\u0000promotional text. Extensive experiments conducted on three datasets demonstrate\u0000that TextSimu poses a more significant threat than existing poisoning attacks,\u0000while our defense method can detect malicious text of target items generated by\u0000TextSimu. By identifying the vulnerability, we aim to advance the development\u0000of more robust ID-free recommender systems.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Morgan E. Prior, Thomas Howard III, Emily Light, Najib Ishaq, Noah M. Daniels
{"title":"Generalized compression and compressive search of large datasets","authors":"Morgan E. Prior, Thomas Howard III, Emily Light, Najib Ishaq, Noah M. Daniels","doi":"arxiv-2409.12161","DOIUrl":"https://doi.org/arxiv-2409.12161","url":null,"abstract":"The Big Data explosion has necessitated the development of search algorithms\u0000that scale sub-linearly in time and memory. While compression algorithms and search algorithms do exist independently,\u0000few algorithms offer both, and those which do are domain-specific. We present panCAKES, a novel approach to compressive search, i.e., a way to\u0000perform $k$-NN and $rho$-NN search on compressed data while only decompressing\u0000a small, relevant, portion of the data. panCAKES assumes the manifold hypothesis and leverages the low-dimensional\u0000structure of the data to compress and search it efficiently. panCAKES is generic over any distance function for which the distance between\u0000two points is proportional to the memory cost of storing an encoding of one in\u0000terms of the other. This property holds for many widely-used distance functions, e.g. string edit\u0000distances (Levenshtein, Needleman-Wunsch, etc.) and set dissimilarity measures\u0000(Jaccard, Dice, etc.). We benchmark panCAKES on a variety of datasets, including genomic, proteomic,\u0000and set data. We compare compression ratios to gzip, and search performance between the\u0000compressed and uncompressed versions of the same dataset. panCAKES achieves compression ratios close to those of gzip, while offering\u0000sub-linear time performance for $k$-NN and $rho$-NN search. We conclude that panCAKES is an efficient, general-purpose algorithm for\u0000exact compressive search on large datasets that obey the manifold hypothesis. We provide an open-source implementation of panCAKES in the Rust programming\u0000language.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}