基于弱几何扭曲的细粒度时尚图像检索对比学习

IEEE transactions on artificial intelligence Pub Date : 2025-02-28 DOI:10.1109/TAI.2025.3545791

Ling Xiao;Toshihiko Yamasaki

{"title":"基于弱几何扭曲的细粒度时尚图像检索对比学习","authors":"Ling Xiao;Toshihiko Yamasaki","doi":"10.1109/TAI.2025.3545791","DOIUrl":null,"url":null,"abstract":"This article addresses fine-grained fashion image retrieval (FIR), which aims at the detailed and precise retrieval of fashion items from extensive databases. Conventional fine-grained FIR methods design complex attention modules to enhance attribute-aware feature discrimination. However, they often ignore the multiview characteristics of real-world fashion data, leading to diminished model accuracy. Furthermore, our empirical analysis revealed that the straightforward application of standard contrastive learning methods to fine-grained FIR often yields suboptimal results. To alleviate this issue, we propose a novel weak geometrical distortion-based contrastive learning (GeoDCL) strategy. Specifically, GeoDCL incorporates both a novel positive pair design and a novel contrastive loss. GeoDCL can be seamlessly integrated into state-of-the-art (SOTA) fine-grained FIR methods during the training stage to enhance performance during inference. When GeoDCL is applied, the model structures of SOTA methods require no modifications. Additionally, GeoDCL is not utilized during inference, ensuring no increase in inference time. Experiments on the FashionAI, DeepFashion, and Zappos50K datasets verified GeoDCL's effectiveness in consistently improving SOTA models. In particular, GeoDCL drastically improved ASENet_V2 from 60.76% to 66.48% in mAP on the FashionAI dataset.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 9","pages":"2409-2421"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GeoDCL: Weak Geometrical Distortion-Based Contrastive Learning for Fine-Grained Fashion Image Retrieval\",\"authors\":\"Ling Xiao;Toshihiko Yamasaki\",\"doi\":\"10.1109/TAI.2025.3545791\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article addresses fine-grained fashion image retrieval (FIR), which aims at the detailed and precise retrieval of fashion items from extensive databases. Conventional fine-grained FIR methods design complex attention modules to enhance attribute-aware feature discrimination. However, they often ignore the multiview characteristics of real-world fashion data, leading to diminished model accuracy. Furthermore, our empirical analysis revealed that the straightforward application of standard contrastive learning methods to fine-grained FIR often yields suboptimal results. To alleviate this issue, we propose a novel weak geometrical distortion-based contrastive learning (GeoDCL) strategy. Specifically, GeoDCL incorporates both a novel positive pair design and a novel contrastive loss. GeoDCL can be seamlessly integrated into state-of-the-art (SOTA) fine-grained FIR methods during the training stage to enhance performance during inference. When GeoDCL is applied, the model structures of SOTA methods require no modifications. Additionally, GeoDCL is not utilized during inference, ensuring no increase in inference time. Experiments on the FashionAI, DeepFashion, and Zappos50K datasets verified GeoDCL's effectiveness in consistently improving SOTA models. In particular, GeoDCL drastically improved ASENet_V2 from 60.76% to 66.48% in mAP on the FashionAI dataset.\",\"PeriodicalId\":73305,\"journal\":{\"name\":\"IEEE transactions on artificial intelligence\",\"volume\":\"6 9\",\"pages\":\"2409-2421\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on artificial intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10908573/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10908573/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文讨论细粒度时尚图像检索（FIR），其目的是从广泛的数据库中详细而精确地检索时尚项目。传统的细粒度FIR方法通过设计复杂的注意模块来增强属性感知特征识别。然而，他们经常忽略现实世界时尚数据的多视图特征，导致模型准确性降低。此外，我们的实证分析表明，直接将标准对比学习方法应用于细粒度FIR通常会产生次优结果。为了解决这个问题，我们提出了一种新的基于弱几何扭曲的对比学习策略。具体来说，GeoDCL结合了一种新的正对设计和一种新的对比损耗。GeoDCL可以在训练阶段无缝集成到最先进的（SOTA）细粒度FIR方法中，以提高推理期间的性能。当应用GeoDCL时，SOTA方法的模型结构不需要修改。此外，在推理过程中不使用GeoDCL，确保不会增加推理时间。在FashionAI、DeepFashion和Zappos50K数据集上的实验验证了GeoDCL在不断改进SOTA模型方面的有效性。特别是，GeoDCL极大地提高了ASENet_V2在FashionAI数据集上的mAP从60.76%提高到66.48%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GeoDCL: Weak Geometrical Distortion-Based Contrastive Learning for Fine-Grained Fashion Image Retrieval

This article addresses fine-grained fashion image retrieval (FIR), which aims at the detailed and precise retrieval of fashion items from extensive databases. Conventional fine-grained FIR methods design complex attention modules to enhance attribute-aware feature discrimination. However, they often ignore the multiview characteristics of real-world fashion data, leading to diminished model accuracy. Furthermore, our empirical analysis revealed that the straightforward application of standard contrastive learning methods to fine-grained FIR often yields suboptimal results. To alleviate this issue, we propose a novel weak geometrical distortion-based contrastive learning (GeoDCL) strategy. Specifically, GeoDCL incorporates both a novel positive pair design and a novel contrastive loss. GeoDCL can be seamlessly integrated into state-of-the-art (SOTA) fine-grained FIR methods during the training stage to enhance performance during inference. When GeoDCL is applied, the model structures of SOTA methods require no modifications. Additionally, GeoDCL is not utilized during inference, ensuring no increase in inference time. Experiments on the FashionAI, DeepFashion, and Zappos50K datasets verified GeoDCL's effectiveness in consistently improving SOTA models. In particular, GeoDCL drastically improved ASENet_V2 from 60.76% to 66.48% in mAP on the FashionAI dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on artificial intelligence

CiteScore

7.70

自引率

0.00%

发文量