Enhancing fashion e-commerce retrieval: A self-supervised graph-integrated framework for cross-modal image–text alignment

IF 6.8 2区 工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY
Shuang Zhou, Nonlabile Binti Salleh Hudin
{"title":"Enhancing fashion e-commerce retrieval: A self-supervised graph-integrated framework for cross-modal image–text alignment","authors":"Shuang Zhou,&nbsp;Nonlabile Binti Salleh Hudin","doi":"10.1016/j.aej.2025.07.039","DOIUrl":null,"url":null,"abstract":"<div><div>Cross-modal retrieval is pivotal for enhancing user experiences in the rapidly evolving fashion e-commerce sector. However, conventional methods often struggle with fine-grained feature alignment and maintaining modal consistency. To overcome these challenges, we propose <strong>G-SimX</strong>, an innovative framework that synergizes self-supervised contrastive learning via SimCLR with Graph Convolutional Networks (GCN) for robust cross-modal image–text retrieval. By harnessing SimCLR, G-SimX significantly reduces reliance on annotated data, while the integration of GCN facilitates structured relational modeling to capture intricate semantic associations between product images and textual descriptions. Extensive experiments on the FashionGenAttnGAN dataset demonstrate that G-SimX achieves outstanding mAP scores of 0.856 for image-to-text and 0.842 for text-to-image retrieval—improving upon the CLIP baseline by over 20%. Moreover, our approach exhibits exceptional stability with reduced training loss fluctuations (0.161 vs. 0.189) and markedly enhanced robustness against data perturbations (0.92 vs. 0.76). Ablation studies further affirm the contributions of each module, revealing mAP drops of 11.7% and 16.9% upon removing SimCLR and GCN, respectively. With a convergence rate 95.8% faster than existing methods, G-SimX offers a scalable and efficient solution for real-world fashion e-commerce applications.</div></div>","PeriodicalId":7484,"journal":{"name":"alexandria engineering journal","volume":"128 ","pages":"Pages 1015-1026"},"PeriodicalIF":6.8000,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"alexandria engineering journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110016825008622","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Cross-modal retrieval is pivotal for enhancing user experiences in the rapidly evolving fashion e-commerce sector. However, conventional methods often struggle with fine-grained feature alignment and maintaining modal consistency. To overcome these challenges, we propose G-SimX, an innovative framework that synergizes self-supervised contrastive learning via SimCLR with Graph Convolutional Networks (GCN) for robust cross-modal image–text retrieval. By harnessing SimCLR, G-SimX significantly reduces reliance on annotated data, while the integration of GCN facilitates structured relational modeling to capture intricate semantic associations between product images and textual descriptions. Extensive experiments on the FashionGenAttnGAN dataset demonstrate that G-SimX achieves outstanding mAP scores of 0.856 for image-to-text and 0.842 for text-to-image retrieval—improving upon the CLIP baseline by over 20%. Moreover, our approach exhibits exceptional stability with reduced training loss fluctuations (0.161 vs. 0.189) and markedly enhanced robustness against data perturbations (0.92 vs. 0.76). Ablation studies further affirm the contributions of each module, revealing mAP drops of 11.7% and 16.9% upon removing SimCLR and GCN, respectively. With a convergence rate 95.8% faster than existing methods, G-SimX offers a scalable and efficient solution for real-world fashion e-commerce applications.

Abstract Image

增强时尚电子商务检索:用于跨模态图像-文本对齐的自监督图形集成框架
在快速发展的时尚电子商务领域,跨模式检索对于增强用户体验至关重要。然而,传统的方法经常在细粒度特征对齐和保持模态一致性方面遇到困难。为了克服这些挑战,我们提出了G-SimX,这是一个创新的框架,通过SimCLR将自监督对比学习与图卷积网络(GCN)协同起来,实现鲁棒的跨模态图像-文本检索。通过利用SimCLR, G-SimX显著减少了对注释数据的依赖,而GCN的集成促进了结构化关系建模,以捕获产品图像和文本描述之间复杂的语义关联。在FashionGenAttnGAN数据集上进行的大量实验表明,G-SimX在图像到文本检索方面取得了0.856的优异mAP分数,在文本到图像检索方面取得了0.842的优异分数,在CLIP基线上提高了20%以上。此外,我们的方法表现出优异的稳定性,减少了训练损失波动(0.161 vs. 0.189),显著增强了对数据扰动的鲁棒性(0.92 vs. 0.76)。消融研究进一步证实了每个模块的贡献,发现去除SimCLR和GCN后,mAP分别下降了11.7%和16.9%。G-SimX的收敛速度比现有方法快95.8%,为现实世界的时尚电子商务应用提供了可扩展和高效的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
alexandria engineering journal
alexandria engineering journal Engineering-General Engineering
CiteScore
11.20
自引率
4.40%
发文量
1015
审稿时长
43 days
期刊介绍: Alexandria Engineering Journal is an international journal devoted to publishing high quality papers in the field of engineering and applied science. Alexandria Engineering Journal is cited in the Engineering Information Services (EIS) and the Chemical Abstracts (CA). The papers published in Alexandria Engineering Journal are grouped into five sections, according to the following classification: • Mechanical, Production, Marine and Textile Engineering • Electrical Engineering, Computer Science and Nuclear Engineering • Civil and Architecture Engineering • Chemical Engineering and Applied Sciences • Environmental Engineering
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信