{"title":"ATFLRec:通过指令调谐大语言模型实现音频-文本融合和低级别自适应的多模态推荐系统","authors":"Zezheng Qin","doi":"arxiv-2409.08543","DOIUrl":null,"url":null,"abstract":"Recommender Systems (RS) play a pivotal role in boosting user satisfaction by\nproviding personalized product suggestions in domains such as e-commerce and\nentertainment. This study examines the integration of multimodal data text and\naudio into large language models (LLMs) with the aim of enhancing\nrecommendation performance. Traditional text and audio recommenders encounter\nlimitations such as the cold-start problem, and recent advancements in LLMs,\nwhile promising, are computationally expensive. To address these issues,\nLow-Rank Adaptation (LoRA) is introduced, which enhances efficiency without\ncompromising performance. The ATFLRec framework is proposed to integrate audio\nand text modalities into a multimodal recommendation system, utilizing various\nLoRA configurations and modality fusion techniques. Results indicate that\nATFLRec outperforms baseline models, including traditional and graph neural\nnetwork-based approaches, achieving higher AUC scores. Furthermore, separate\nfine-tuning of audio and text data with distinct LoRA modules yields optimal\nperformance, with different pooling methods and Mel filter bank numbers\nsignificantly impacting performance. This research offers valuable insights\ninto optimizing multimodal recommender systems and advancing the integration of\ndiverse data modalities in LLMs.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"43 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ATFLRec: A Multimodal Recommender System with Audio-Text Fusion and Low-Rank Adaptation via Instruction-Tuned Large Language Model\",\"authors\":\"Zezheng Qin\",\"doi\":\"arxiv-2409.08543\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recommender Systems (RS) play a pivotal role in boosting user satisfaction by\\nproviding personalized product suggestions in domains such as e-commerce and\\nentertainment. This study examines the integration of multimodal data text and\\naudio into large language models (LLMs) with the aim of enhancing\\nrecommendation performance. Traditional text and audio recommenders encounter\\nlimitations such as the cold-start problem, and recent advancements in LLMs,\\nwhile promising, are computationally expensive. To address these issues,\\nLow-Rank Adaptation (LoRA) is introduced, which enhances efficiency without\\ncompromising performance. The ATFLRec framework is proposed to integrate audio\\nand text modalities into a multimodal recommendation system, utilizing various\\nLoRA configurations and modality fusion techniques. Results indicate that\\nATFLRec outperforms baseline models, including traditional and graph neural\\nnetwork-based approaches, achieving higher AUC scores. Furthermore, separate\\nfine-tuning of audio and text data with distinct LoRA modules yields optimal\\nperformance, with different pooling methods and Mel filter bank numbers\\nsignificantly impacting performance. This research offers valuable insights\\ninto optimizing multimodal recommender systems and advancing the integration of\\ndiverse data modalities in LLMs.\",\"PeriodicalId\":501281,\"journal\":{\"name\":\"arXiv - CS - Information Retrieval\",\"volume\":\"43 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.08543\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08543","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
ATFLRec: A Multimodal Recommender System with Audio-Text Fusion and Low-Rank Adaptation via Instruction-Tuned Large Language Model
Recommender Systems (RS) play a pivotal role in boosting user satisfaction by
providing personalized product suggestions in domains such as e-commerce and
entertainment. This study examines the integration of multimodal data text and
audio into large language models (LLMs) with the aim of enhancing
recommendation performance. Traditional text and audio recommenders encounter
limitations such as the cold-start problem, and recent advancements in LLMs,
while promising, are computationally expensive. To address these issues,
Low-Rank Adaptation (LoRA) is introduced, which enhances efficiency without
compromising performance. The ATFLRec framework is proposed to integrate audio
and text modalities into a multimodal recommendation system, utilizing various
LoRA configurations and modality fusion techniques. Results indicate that
ATFLRec outperforms baseline models, including traditional and graph neural
network-based approaches, achieving higher AUC scores. Furthermore, separate
fine-tuning of audio and text data with distinct LoRA modules yields optimal
performance, with different pooling methods and Mel filter bank numbers
significantly impacting performance. This research offers valuable insights
into optimizing multimodal recommender systems and advancing the integration of
diverse data modalities in LLMs.