{"title":"使用大型多模态模型预测服装兼容性","authors":"Chia-Ling Chang , Yen-Liang Chen , Dao-Xuan Jiang","doi":"10.1016/j.dss.2025.114457","DOIUrl":null,"url":null,"abstract":"<div><div>Outfit coordination is a direct way for people to express themselves. However, judging the compatibility between tops and bottoms requires considering multiple factors such as color and style. This process is time-consuming and prone to errors. In recent years, the development of large language models and large multi-modal models has transformed many application fields. This study aims to explore how to leverage these models to achieve breakthroughs in fashion outfit recommendations.</div><div>This research combines the keyword response text from the large language model Gemini in the Vision Question Answering (VQA) task with the deep feature fusion technology of the large multi-modal model Beit3. By providing only image data of the clothing, users can evaluate the compatibility of tops and bottoms, making the process more convenient. Our proposed model, the Large Multi-modality Language Model for Outfit Recommendation (LMLMO), outperforms previously proposed models on the FashionVC and Evaluation3 datasets. Moreover, experimental results show that different types of keyword responses have varying impacts on the model, offering new directions and insights for future research.</div></div>","PeriodicalId":55181,"journal":{"name":"Decision Support Systems","volume":"194 ","pages":"Article 114457"},"PeriodicalIF":6.7000,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using large multimodal models to predict outfit compatibility\",\"authors\":\"Chia-Ling Chang , Yen-Liang Chen , Dao-Xuan Jiang\",\"doi\":\"10.1016/j.dss.2025.114457\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Outfit coordination is a direct way for people to express themselves. However, judging the compatibility between tops and bottoms requires considering multiple factors such as color and style. This process is time-consuming and prone to errors. In recent years, the development of large language models and large multi-modal models has transformed many application fields. This study aims to explore how to leverage these models to achieve breakthroughs in fashion outfit recommendations.</div><div>This research combines the keyword response text from the large language model Gemini in the Vision Question Answering (VQA) task with the deep feature fusion technology of the large multi-modal model Beit3. By providing only image data of the clothing, users can evaluate the compatibility of tops and bottoms, making the process more convenient. Our proposed model, the Large Multi-modality Language Model for Outfit Recommendation (LMLMO), outperforms previously proposed models on the FashionVC and Evaluation3 datasets. Moreover, experimental results show that different types of keyword responses have varying impacts on the model, offering new directions and insights for future research.</div></div>\",\"PeriodicalId\":55181,\"journal\":{\"name\":\"Decision Support Systems\",\"volume\":\"194 \",\"pages\":\"Article 114457\"},\"PeriodicalIF\":6.7000,\"publicationDate\":\"2025-04-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Decision Support Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167923625000582\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Decision Support Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167923625000582","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Using large multimodal models to predict outfit compatibility
Outfit coordination is a direct way for people to express themselves. However, judging the compatibility between tops and bottoms requires considering multiple factors such as color and style. This process is time-consuming and prone to errors. In recent years, the development of large language models and large multi-modal models has transformed many application fields. This study aims to explore how to leverage these models to achieve breakthroughs in fashion outfit recommendations.
This research combines the keyword response text from the large language model Gemini in the Vision Question Answering (VQA) task with the deep feature fusion technology of the large multi-modal model Beit3. By providing only image data of the clothing, users can evaluate the compatibility of tops and bottoms, making the process more convenient. Our proposed model, the Large Multi-modality Language Model for Outfit Recommendation (LMLMO), outperforms previously proposed models on the FashionVC and Evaluation3 datasets. Moreover, experimental results show that different types of keyword responses have varying impacts on the model, offering new directions and insights for future research.
期刊介绍:
The common thread of articles published in Decision Support Systems is their relevance to theoretical and technical issues in the support of enhanced decision making. The areas addressed may include foundations, functionality, interfaces, implementation, impacts, and evaluation of decision support systems (DSSs).