Zongbao Liang, Liu Yang, YunFei Yuan, Bo Chen, Feifei Tang
{"title":"基于图像和文本组合的服装检索","authors":"Zongbao Liang, Liu Yang, YunFei Yuan, Bo Chen, Feifei Tang","doi":"10.1109/ICSAI53574.2021.9664041","DOIUrl":null,"url":null,"abstract":"Image text combination retrieval is a new direction in multimodal retrieval, in which query is composed of image and modified text. The retrieved target image should not only be similar to the query image, but also have the change specified by the modified text. The traditional clothing retrieval adopts the single-mode retrieval method of image search or text search, which is lack of retrieval flexibility. To solve the problem of feature fusion caused by semantic differences between clothing image and text, this paper proposes a multi-dimensional feature fusion model, which constructs a high-dimensional visual feature and semantic feature fusion model based on the scaling point product attention mechanism to extract high-dimensional fusion features, then the low dimension visual semantic fusion features are used as the residual of high dimension fusion features for target image retrieval. Compared with the previous feature fusion methods, the recall rate of Top1 on Fashion200k data set is increased by 15.4%, which is obviously superior to most of the existing graph and text feature fusion models, which shows that the model is advanced and effective.","PeriodicalId":131284,"journal":{"name":"2021 7th International Conference on Systems and Informatics (ICSAI)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Clothing retrieval based by image and text combination\",\"authors\":\"Zongbao Liang, Liu Yang, YunFei Yuan, Bo Chen, Feifei Tang\",\"doi\":\"10.1109/ICSAI53574.2021.9664041\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image text combination retrieval is a new direction in multimodal retrieval, in which query is composed of image and modified text. The retrieved target image should not only be similar to the query image, but also have the change specified by the modified text. The traditional clothing retrieval adopts the single-mode retrieval method of image search or text search, which is lack of retrieval flexibility. To solve the problem of feature fusion caused by semantic differences between clothing image and text, this paper proposes a multi-dimensional feature fusion model, which constructs a high-dimensional visual feature and semantic feature fusion model based on the scaling point product attention mechanism to extract high-dimensional fusion features, then the low dimension visual semantic fusion features are used as the residual of high dimension fusion features for target image retrieval. Compared with the previous feature fusion methods, the recall rate of Top1 on Fashion200k data set is increased by 15.4%, which is obviously superior to most of the existing graph and text feature fusion models, which shows that the model is advanced and effective.\",\"PeriodicalId\":131284,\"journal\":{\"name\":\"2021 7th International Conference on Systems and Informatics (ICSAI)\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 7th International Conference on Systems and Informatics (ICSAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSAI53574.2021.9664041\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th International Conference on Systems and Informatics (ICSAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSAI53574.2021.9664041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Clothing retrieval based by image and text combination
Image text combination retrieval is a new direction in multimodal retrieval, in which query is composed of image and modified text. The retrieved target image should not only be similar to the query image, but also have the change specified by the modified text. The traditional clothing retrieval adopts the single-mode retrieval method of image search or text search, which is lack of retrieval flexibility. To solve the problem of feature fusion caused by semantic differences between clothing image and text, this paper proposes a multi-dimensional feature fusion model, which constructs a high-dimensional visual feature and semantic feature fusion model based on the scaling point product attention mechanism to extract high-dimensional fusion features, then the low dimension visual semantic fusion features are used as the residual of high dimension fusion features for target image retrieval. Compared with the previous feature fusion methods, the recall rate of Top1 on Fashion200k data set is increased by 15.4%, which is obviously superior to most of the existing graph and text feature fusion models, which shows that the model is advanced and effective.