推荐系统中的多模式生成模型

arXiv - CS - Information Retrieval Pub Date : 2024-09-17 DOI:arxiv-2409.10993

Arnau Ramisa, Rene Vidal, Yashar Deldjoo, Zhankui He, Julian McAuley, Anton Korikov, Scott Sanner, Mahesh Sathiamoorthy, Atoosa Kasrizadeh, Silvia Milano, Francesco Ricci

{"title":"推荐系统中的多模式生成模型","authors":"Arnau Ramisa, Rene Vidal, Yashar Deldjoo, Zhankui He, Julian McAuley, Anton Korikov, Scott Sanner, Mahesh Sathiamoorthy, Atoosa Kasrizadeh, Silvia Milano, Francesco Ricci","doi":"arxiv-2409.10993","DOIUrl":null,"url":null,"abstract":"Many recommendation systems limit user inputs to text strings or behavior\nsignals such as clicks and purchases, and system outputs to a list of products\nsorted by relevance. With the advent of generative AI, users have come to\nexpect richer levels of interactions. In visual search, for example, a user may\nprovide a picture of their desired product along with a natural language\nmodification of the content of the picture (e.g., a dress like the one shown in\nthe picture but in red color). Moreover, users may want to better understand\nthe recommendations they receive by visualizing how the product fits their use\ncase, e.g., with a representation of how a garment might look on them, or how a\nfurniture item might look in their room. Such advanced levels of interaction\nrequire recommendation systems that are able to discover both shared and\ncomplementary information about the product across modalities, and visualize\nthe product in a realistic and informative way. However, existing systems often\ntreat multiple modalities independently: text search is usually done by\ncomparing the user query to product titles and descriptions, while visual\nsearch is typically done by comparing an image provided by the customer to\nproduct images. We argue that future recommendation systems will benefit from a\nmulti-modal understanding of the products that leverages the rich information\nretailers have about both customers and products to come up with the best\nrecommendations. In this chapter we review recommendation systems that use\nmultiple data modalities simultaneously.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"11 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-modal Generative Models in Recommendation System\",\"authors\":\"Arnau Ramisa, Rene Vidal, Yashar Deldjoo, Zhankui He, Julian McAuley, Anton Korikov, Scott Sanner, Mahesh Sathiamoorthy, Atoosa Kasrizadeh, Silvia Milano, Francesco Ricci\",\"doi\":\"arxiv-2409.10993\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many recommendation systems limit user inputs to text strings or behavior\\nsignals such as clicks and purchases, and system outputs to a list of products\\nsorted by relevance. With the advent of generative AI, users have come to\\nexpect richer levels of interactions. In visual search, for example, a user may\\nprovide a picture of their desired product along with a natural language\\nmodification of the content of the picture (e.g., a dress like the one shown in\\nthe picture but in red color). Moreover, users may want to better understand\\nthe recommendations they receive by visualizing how the product fits their use\\ncase, e.g., with a representation of how a garment might look on them, or how a\\nfurniture item might look in their room. Such advanced levels of interaction\\nrequire recommendation systems that are able to discover both shared and\\ncomplementary information about the product across modalities, and visualize\\nthe product in a realistic and informative way. However, existing systems often\\ntreat multiple modalities independently: text search is usually done by\\ncomparing the user query to product titles and descriptions, while visual\\nsearch is typically done by comparing an image provided by the customer to\\nproduct images. We argue that future recommendation systems will benefit from a\\nmulti-modal understanding of the products that leverages the rich information\\nretailers have about both customers and products to come up with the best\\nrecommendations. In this chapter we review recommendation systems that use\\nmultiple data modalities simultaneously.\",\"PeriodicalId\":501281,\"journal\":{\"name\":\"arXiv - CS - Information Retrieval\",\"volume\":\"11 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.10993\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10993","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

许多推荐系统将用户输入限制为文本字符串或行为信号（如点击和购买），系统输出为按相关性排序的产品列表。随着生成式人工智能的出现，用户开始期待更丰富的交互。例如，在视觉搜索中，用户可能会提供一张所需的产品图片，并用自然语言对图片内容进行修改（例如，提供一件与图片中相似的红色连衣裙）。此外，用户可能希望通过可视化方式更好地理解所收到的推荐，例如，展示服装穿在身上的效果，或家具摆放在房间里的效果。这种高级别的交互要求推荐系统能够发现跨模式的产品共享信息和互补信息，并以逼真和信息丰富的方式将产品可视化。然而，现有的系统往往将多种模式分开处理：文本搜索通常是通过将用户查询与产品标题和描述进行比较来完成的，而视觉搜索通常是通过将客户提供的图片与产品图片进行比较来完成的。我们认为，未来的推荐系统将受益于对产品的多模式理解，这种理解可以利用零售商所拥有的关于顾客和产品的丰富信息，从而提出最佳推荐。在本章中，我们将回顾同时使用多种数据模式的推荐系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-modal Generative Models in Recommendation System

Many recommendation systems limit user inputs to text strings or behavior signals such as clicks and purchases, and system outputs to a list of products sorted by relevance. With the advent of generative AI, users have come to expect richer levels of interactions. In visual search, for example, a user may provide a picture of their desired product along with a natural language modification of the content of the picture (e.g., a dress like the one shown in the picture but in red color). Moreover, users may want to better understand the recommendations they receive by visualizing how the product fits their use case, e.g., with a representation of how a garment might look on them, or how a furniture item might look in their room. Such advanced levels of interaction require recommendation systems that are able to discover both shared and complementary information about the product across modalities, and visualize the product in a realistic and informative way. However, existing systems often treat multiple modalities independently: text search is usually done by comparing the user query to product titles and descriptions, while visual search is typically done by comparing an image provided by the customer to product images. We argue that future recommendation systems will benefit from a multi-modal understanding of the products that leverages the rich information retailers have about both customers and products to come up with the best recommendations. In this chapter we review recommendation systems that use multiple data modalities simultaneously.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Information Retrieval

自引率

0.00%

发文量