使用大型多模态模型预测服装兼容性

IF 6.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Chia-Ling Chang , Yen-Liang Chen , Dao-Xuan Jiang
{"title":"使用大型多模态模型预测服装兼容性","authors":"Chia-Ling Chang ,&nbsp;Yen-Liang Chen ,&nbsp;Dao-Xuan Jiang","doi":"10.1016/j.dss.2025.114457","DOIUrl":null,"url":null,"abstract":"<div><div>Outfit coordination is a direct way for people to express themselves. However, judging the compatibility between tops and bottoms requires considering multiple factors such as color and style. This process is time-consuming and prone to errors. In recent years, the development of large language models and large multi-modal models has transformed many application fields. This study aims to explore how to leverage these models to achieve breakthroughs in fashion outfit recommendations.</div><div>This research combines the keyword response text from the large language model Gemini in the Vision Question Answering (VQA) task with the deep feature fusion technology of the large multi-modal model Beit3. By providing only image data of the clothing, users can evaluate the compatibility of tops and bottoms, making the process more convenient. Our proposed model, the Large Multi-modality Language Model for Outfit Recommendation (LMLMO), outperforms previously proposed models on the FashionVC and Evaluation3 datasets. Moreover, experimental results show that different types of keyword responses have varying impacts on the model, offering new directions and insights for future research.</div></div>","PeriodicalId":55181,"journal":{"name":"Decision Support Systems","volume":"194 ","pages":"Article 114457"},"PeriodicalIF":6.7000,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using large multimodal models to predict outfit compatibility\",\"authors\":\"Chia-Ling Chang ,&nbsp;Yen-Liang Chen ,&nbsp;Dao-Xuan Jiang\",\"doi\":\"10.1016/j.dss.2025.114457\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Outfit coordination is a direct way for people to express themselves. However, judging the compatibility between tops and bottoms requires considering multiple factors such as color and style. This process is time-consuming and prone to errors. In recent years, the development of large language models and large multi-modal models has transformed many application fields. This study aims to explore how to leverage these models to achieve breakthroughs in fashion outfit recommendations.</div><div>This research combines the keyword response text from the large language model Gemini in the Vision Question Answering (VQA) task with the deep feature fusion technology of the large multi-modal model Beit3. By providing only image data of the clothing, users can evaluate the compatibility of tops and bottoms, making the process more convenient. Our proposed model, the Large Multi-modality Language Model for Outfit Recommendation (LMLMO), outperforms previously proposed models on the FashionVC and Evaluation3 datasets. Moreover, experimental results show that different types of keyword responses have varying impacts on the model, offering new directions and insights for future research.</div></div>\",\"PeriodicalId\":55181,\"journal\":{\"name\":\"Decision Support Systems\",\"volume\":\"194 \",\"pages\":\"Article 114457\"},\"PeriodicalIF\":6.7000,\"publicationDate\":\"2025-04-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Decision Support Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167923625000582\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Decision Support Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167923625000582","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

服装搭配是人们表达自己的一种直接方式。然而,判断上衣和下装的兼容性需要考虑多种因素,如颜色和款式。这个过程很耗时,而且容易出错。近年来,大型语言模型和大型多模态模型的发展改变了许多应用领域。本研究旨在探讨如何利用这些模型来实现时尚服装推荐的突破。本研究将视觉问答(VQA)任务中大型语言模型Gemini的关键字响应文本与大型多模态模型Beit3的深度特征融合技术相结合。通过提供服装的图像数据,用户可以评估上衣和下装的兼容性,使过程更加方便。我们提出的服装推荐大型多模态语言模型(LMLMO)在FashionVC和Evaluation3数据集上优于先前提出的模型。此外,实验结果表明,不同类型的关键词响应对模型的影响不同,为未来的研究提供了新的方向和见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Using large multimodal models to predict outfit compatibility
Outfit coordination is a direct way for people to express themselves. However, judging the compatibility between tops and bottoms requires considering multiple factors such as color and style. This process is time-consuming and prone to errors. In recent years, the development of large language models and large multi-modal models has transformed many application fields. This study aims to explore how to leverage these models to achieve breakthroughs in fashion outfit recommendations.
This research combines the keyword response text from the large language model Gemini in the Vision Question Answering (VQA) task with the deep feature fusion technology of the large multi-modal model Beit3. By providing only image data of the clothing, users can evaluate the compatibility of tops and bottoms, making the process more convenient. Our proposed model, the Large Multi-modality Language Model for Outfit Recommendation (LMLMO), outperforms previously proposed models on the FashionVC and Evaluation3 datasets. Moreover, experimental results show that different types of keyword responses have varying impacts on the model, offering new directions and insights for future research.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Decision Support Systems
Decision Support Systems 工程技术-计算机:人工智能
CiteScore
14.70
自引率
6.70%
发文量
119
审稿时长
13 months
期刊介绍: The common thread of articles published in Decision Support Systems is their relevance to theoretical and technical issues in the support of enhanced decision making. The areas addressed may include foundations, functionality, interfaces, implementation, impacts, and evaluation of decision support systems (DSSs).
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信