{"title":"基于弱几何扭曲的细粒度时尚图像检索对比学习","authors":"Ling Xiao;Toshihiko Yamasaki","doi":"10.1109/TAI.2025.3545791","DOIUrl":null,"url":null,"abstract":"This article addresses fine-grained fashion image retrieval (FIR), which aims at the detailed and precise retrieval of fashion items from extensive databases. Conventional fine-grained FIR methods design complex attention modules to enhance attribute-aware feature discrimination. However, they often ignore the multiview characteristics of real-world fashion data, leading to diminished model accuracy. Furthermore, our empirical analysis revealed that the straightforward application of standard contrastive learning methods to fine-grained FIR often yields suboptimal results. To alleviate this issue, we propose a novel weak geometrical distortion-based contrastive learning (GeoDCL) strategy. Specifically, GeoDCL incorporates both a novel positive pair design and a novel contrastive loss. GeoDCL can be seamlessly integrated into state-of-the-art (SOTA) fine-grained FIR methods during the training stage to enhance performance during inference. When GeoDCL is applied, the model structures of SOTA methods require no modifications. Additionally, GeoDCL is not utilized during inference, ensuring no increase in inference time. Experiments on the FashionAI, DeepFashion, and Zappos50K datasets verified GeoDCL's effectiveness in consistently improving SOTA models. In particular, GeoDCL drastically improved ASENet_V2 from 60.76% to 66.48% in mAP on the FashionAI dataset.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 9","pages":"2409-2421"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GeoDCL: Weak Geometrical Distortion-Based Contrastive Learning for Fine-Grained Fashion Image Retrieval\",\"authors\":\"Ling Xiao;Toshihiko Yamasaki\",\"doi\":\"10.1109/TAI.2025.3545791\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article addresses fine-grained fashion image retrieval (FIR), which aims at the detailed and precise retrieval of fashion items from extensive databases. Conventional fine-grained FIR methods design complex attention modules to enhance attribute-aware feature discrimination. However, they often ignore the multiview characteristics of real-world fashion data, leading to diminished model accuracy. Furthermore, our empirical analysis revealed that the straightforward application of standard contrastive learning methods to fine-grained FIR often yields suboptimal results. To alleviate this issue, we propose a novel weak geometrical distortion-based contrastive learning (GeoDCL) strategy. Specifically, GeoDCL incorporates both a novel positive pair design and a novel contrastive loss. GeoDCL can be seamlessly integrated into state-of-the-art (SOTA) fine-grained FIR methods during the training stage to enhance performance during inference. When GeoDCL is applied, the model structures of SOTA methods require no modifications. Additionally, GeoDCL is not utilized during inference, ensuring no increase in inference time. Experiments on the FashionAI, DeepFashion, and Zappos50K datasets verified GeoDCL's effectiveness in consistently improving SOTA models. In particular, GeoDCL drastically improved ASENet_V2 from 60.76% to 66.48% in mAP on the FashionAI dataset.\",\"PeriodicalId\":73305,\"journal\":{\"name\":\"IEEE transactions on artificial intelligence\",\"volume\":\"6 9\",\"pages\":\"2409-2421\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on artificial intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10908573/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10908573/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
This article addresses fine-grained fashion image retrieval (FIR), which aims at the detailed and precise retrieval of fashion items from extensive databases. Conventional fine-grained FIR methods design complex attention modules to enhance attribute-aware feature discrimination. However, they often ignore the multiview characteristics of real-world fashion data, leading to diminished model accuracy. Furthermore, our empirical analysis revealed that the straightforward application of standard contrastive learning methods to fine-grained FIR often yields suboptimal results. To alleviate this issue, we propose a novel weak geometrical distortion-based contrastive learning (GeoDCL) strategy. Specifically, GeoDCL incorporates both a novel positive pair design and a novel contrastive loss. GeoDCL can be seamlessly integrated into state-of-the-art (SOTA) fine-grained FIR methods during the training stage to enhance performance during inference. When GeoDCL is applied, the model structures of SOTA methods require no modifications. Additionally, GeoDCL is not utilized during inference, ensuring no increase in inference time. Experiments on the FashionAI, DeepFashion, and Zappos50K datasets verified GeoDCL's effectiveness in consistently improving SOTA models. In particular, GeoDCL drastically improved ASENet_V2 from 60.76% to 66.48% in mAP on the FashionAI dataset.