Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference

arXiv - CS - Information Retrieval Pub Date : 2024-09-18 DOI:arxiv-2409.12150

Najmeh Forouzandehmehr, Nima Farrokhsiar, Ramin Giahi, Evren Korpeoglu, Kannan Achan

{"title":"Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference","authors":"Najmeh Forouzandehmehr, Nima Farrokhsiar, Ramin Giahi, Evren Korpeoglu, Kannan Achan","doi":"arxiv-2409.12150","DOIUrl":null,"url":null,"abstract":"Personalized outfit recommendation remains a complex challenge, demanding\nboth fashion compatibility understanding and trend awareness. This paper\npresents a novel framework that harnesses the expressive power of large\nlanguage models (LLMs) for this task, mitigating their \"black box\" and static\nnature through fine-tuning and direct feedback integration. We bridge the item\nvisual-textual gap in items descriptions by employing image captioning with a\nMultimodal Large Language Model (MLLM). This enables the LLM to extract style\nand color characteristics from human-curated fashion images, forming the basis\nfor personalized recommendations. The LLM is efficiently fine-tuned on the\nopen-source Polyvore dataset of curated fashion images, optimizing its ability\nto recommend stylish outfits. A direct preference mechanism using negative\nexamples is employed to enhance the LLM's decision-making process. This creates\na self-enhancing AI feedback loop that continuously refines recommendations in\nline with seasonal fashion trends. Our framework is evaluated on the Polyvore\ndataset, demonstrating its effectiveness in two key tasks: fill-in-the-blank,\nand complementary item retrieval. These evaluations underline the framework's\nability to generate stylish, trend-aligned outfit suggestions, continuously\nimproving through direct feedback. The evaluation results demonstrated that our\nproposed framework significantly outperforms the base LLM, creating more\ncohesive outfits. The improved performance in these tasks underscores the\nproposed framework's potential to enhance the shopping experience with accurate\nsuggestions, proving its effectiveness over the vanilla LLM based outfit\ngeneration.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"16 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12150","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Personalized outfit recommendation remains a complex challenge, demanding both fashion compatibility understanding and trend awareness. This paper presents a novel framework that harnesses the expressive power of large language models (LLMs) for this task, mitigating their "black box" and static nature through fine-tuning and direct feedback integration. We bridge the item visual-textual gap in items descriptions by employing image captioning with a Multimodal Large Language Model (MLLM). This enables the LLM to extract style and color characteristics from human-curated fashion images, forming the basis for personalized recommendations. The LLM is efficiently fine-tuned on the open-source Polyvore dataset of curated fashion images, optimizing its ability to recommend stylish outfits. A direct preference mechanism using negative examples is employed to enhance the LLM's decision-making process. This creates a self-enhancing AI feedback loop that continuously refines recommendations in line with seasonal fashion trends. Our framework is evaluated on the Polyvore dataset, demonstrating its effectiveness in two key tasks: fill-in-the-blank, and complementary item retrieval. These evaluations underline the framework's ability to generate stylish, trend-aligned outfit suggestions, continuously improving through direct feedback. The evaluation results demonstrated that our proposed framework significantly outperforms the base LLM, creating more cohesive outfits. The improved performance in these tasks underscores the proposed framework's potential to enhance the shopping experience with accurate suggestions, proving its effectiveness over the vanilla LLM based outfit generation.

查看原文本刊更多论文

解码风格：高效微调 LLM，实现图像引导下的服装偏好推荐

个性化服装推荐仍然是一项复杂的挑战，既需要对时尚兼容性的理解，又需要对流行趋势的认识。本文提出了一个新颖的框架，利用大型语言模型（LLM）的表现力来完成这项任务，通过微调和直接反馈整合来减轻其 "黑箱 "和静态特性。我们通过多模态大语言模型（MLLM）使用图像标题来弥合项目描述中的项目视觉与文本之间的差距。这使得多模态大语言模型能够从人类编辑的时尚图片中提取风格和色彩特征，为个性化推荐奠定基础。LLM 在开源的 Polyvore 时尚图片数据集上进行了有效的微调，优化了其推荐时尚服装的能力。使用负面示例的直接偏好机制可增强 LLM 的决策过程。这就形成了一个自我增强的人工智能反馈回路，可根据季节性时尚趋势不断改进推荐。我们的框架在 Polyvoredataset 上进行了评估，证明了它在两个关键任务中的有效性：填空和补充项目检索。这些评估强调了该框架生成时尚、符合潮流的服装建议的能力，并通过直接反馈不断改进。评估结果表明，我们提出的框架明显优于基本的 LLM，能生成更具凝聚力的服装。在这些任务中性能的提高凸显了拟议框架通过准确的建议提升购物体验的潜力，证明了它比基于 LLM 的服装生成更有效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Information Retrieval

自引率

0.00%

发文量