基于文本的人物检索的多变量增强的细粒度知识递进网络

IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Kai Ren , Chuanping Hu , Hao Xi , Yongqiang Li , Jinhao Fan , Lihua Liu
{"title":"基于文本的人物检索的多变量增强的细粒度知识递进网络","authors":"Kai Ren ,&nbsp;Chuanping Hu ,&nbsp;Hao Xi ,&nbsp;Yongqiang Li ,&nbsp;Jinhao Fan ,&nbsp;Lihua Liu","doi":"10.1016/j.knosys.2025.113999","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective:</h3><div>Text-Based Person Retrieval (TBPR) aims to match target images using natural language descriptions, yet it faces significant challenges such as complex visual structures, diverse semantic expressions, and limited annotated data. These issues lead to intra-modal knowledge uncertainty and weak inter-modal correlations. Existing approaches predominantly focus on explicit alignment between heterogeneous modalities, often overlooking the latent associations within homogeneous knowledge. This study addresses these limitations to enhance both the performance and efficiency of TBPR.</div></div><div><h3>Method:</h3><div>This paper proposes a Multivariate Enhancement Fine-Grained Knowledge Progressive Network (ME-FKPN) to address the challenge of text-based person retrieval across modalities. ME-FKPN enhances the synergy between homogeneous and heterogeneous knowledge in a hierarchical manner, progressively establishing more accurate semantic alignments between images and texts. The framework comprises three key innovations: Standardized Knowledge Anchor (SKA) constructs a knowledge graph to standardize semantics; Mixed of LoRA Experts (MoLE) integrates defogged color and grayscale images to extract multi-level visual features; Multivariate Knowledge Progressive Optimization Strategy (MKPOS) achieves steady performance improvements through hierarchical augmentation and staged training.</div></div><div><h3>Novelty:</h3><div>The proposed ME-FKPN model introduces a novel solution to the TBPR task by integrating hierarchical knowledge enhancement and progressive optimization strategies. By deeply mining latent associations among homogeneous knowledge and fostering collaborative representation of heterogeneous knowledge, the approach effectively overcomes the limitations of existing techniques in handling complex cross-modal relationships and data sparsity.</div></div><div><h3>Findings:</h3><div>ME-FKPN outperforms all state-of-the-art TBPR models across three public datasets. Notably, on the challenging UFine6926 ultra-fine-grained dataset, our method achieves improvements of 16.49%, 9.79%, 6.23%, and 14.69% on R@1, R@5, R@10, and mAP metrics, respectively, compared to mainstream approaches.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"325 ","pages":"Article 113999"},"PeriodicalIF":7.6000,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fine-grained knowledge progressive network with multivariate enhancements for text-based person retrieval\",\"authors\":\"Kai Ren ,&nbsp;Chuanping Hu ,&nbsp;Hao Xi ,&nbsp;Yongqiang Li ,&nbsp;Jinhao Fan ,&nbsp;Lihua Liu\",\"doi\":\"10.1016/j.knosys.2025.113999\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objective:</h3><div>Text-Based Person Retrieval (TBPR) aims to match target images using natural language descriptions, yet it faces significant challenges such as complex visual structures, diverse semantic expressions, and limited annotated data. These issues lead to intra-modal knowledge uncertainty and weak inter-modal correlations. Existing approaches predominantly focus on explicit alignment between heterogeneous modalities, often overlooking the latent associations within homogeneous knowledge. This study addresses these limitations to enhance both the performance and efficiency of TBPR.</div></div><div><h3>Method:</h3><div>This paper proposes a Multivariate Enhancement Fine-Grained Knowledge Progressive Network (ME-FKPN) to address the challenge of text-based person retrieval across modalities. ME-FKPN enhances the synergy between homogeneous and heterogeneous knowledge in a hierarchical manner, progressively establishing more accurate semantic alignments between images and texts. The framework comprises three key innovations: Standardized Knowledge Anchor (SKA) constructs a knowledge graph to standardize semantics; Mixed of LoRA Experts (MoLE) integrates defogged color and grayscale images to extract multi-level visual features; Multivariate Knowledge Progressive Optimization Strategy (MKPOS) achieves steady performance improvements through hierarchical augmentation and staged training.</div></div><div><h3>Novelty:</h3><div>The proposed ME-FKPN model introduces a novel solution to the TBPR task by integrating hierarchical knowledge enhancement and progressive optimization strategies. By deeply mining latent associations among homogeneous knowledge and fostering collaborative representation of heterogeneous knowledge, the approach effectively overcomes the limitations of existing techniques in handling complex cross-modal relationships and data sparsity.</div></div><div><h3>Findings:</h3><div>ME-FKPN outperforms all state-of-the-art TBPR models across three public datasets. Notably, on the challenging UFine6926 ultra-fine-grained dataset, our method achieves improvements of 16.49%, 9.79%, 6.23%, and 14.69% on R@1, R@5, R@10, and mAP metrics, respectively, compared to mainstream approaches.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"325 \",\"pages\":\"Article 113999\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705125010445\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125010445","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

目的:基于文本的人物检索(Text-Based Person Retrieval, TBPR)旨在利用自然语言描述来匹配目标图像,但它面临着视觉结构复杂、语义表达多样、标注数据有限等重大挑战。这些问题导致了模态内知识的不确定性和弱模态间的相关性。现有的方法主要集中在异质模式之间的明确对齐,往往忽略了同质知识中的潜在关联。本研究解决了这些限制,以提高TBPR的性能和效率。方法:本文提出了一种多变量增强细粒度知识递进网络(ME-FKPN)来解决基于文本的跨模式人物检索的挑战。ME-FKPN以分层方式增强同质知识和异质知识之间的协同作用,逐步在图像和文本之间建立更准确的语义对齐。该框架包括三个关键创新:标准化知识锚(SKA)构建知识图谱,实现语义标准化;混合的LoRA专家(MoLE)集成了去雾的彩色和灰度图像,以提取多层次的视觉特征;多变量知识渐进式优化策略(MKPOS)通过分层增强和分阶段训练实现性能的稳定提升。新颖性:本文提出的ME-FKPN模型通过整合层次知识增强和渐进式优化策略,为TBPR任务提供了一种新颖的解决方案。通过深入挖掘同质知识之间的潜在关联,促进异构知识的协同表示,该方法有效克服了现有技术在处理复杂跨模态关系和数据稀疏性方面的局限性。研究结果:ME-FKPN在三个公共数据集上优于所有最先进的TBPR模型。值得注意的是,在具有挑战性的UFine6926超细粒度数据集上,与主流方法相比,我们的方法在R@1、R@5、R@10和mAP指标上分别提高了16.49%、9.79%、6.23%和14.69%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Fine-grained knowledge progressive network with multivariate enhancements for text-based person retrieval

Objective:

Text-Based Person Retrieval (TBPR) aims to match target images using natural language descriptions, yet it faces significant challenges such as complex visual structures, diverse semantic expressions, and limited annotated data. These issues lead to intra-modal knowledge uncertainty and weak inter-modal correlations. Existing approaches predominantly focus on explicit alignment between heterogeneous modalities, often overlooking the latent associations within homogeneous knowledge. This study addresses these limitations to enhance both the performance and efficiency of TBPR.

Method:

This paper proposes a Multivariate Enhancement Fine-Grained Knowledge Progressive Network (ME-FKPN) to address the challenge of text-based person retrieval across modalities. ME-FKPN enhances the synergy between homogeneous and heterogeneous knowledge in a hierarchical manner, progressively establishing more accurate semantic alignments between images and texts. The framework comprises three key innovations: Standardized Knowledge Anchor (SKA) constructs a knowledge graph to standardize semantics; Mixed of LoRA Experts (MoLE) integrates defogged color and grayscale images to extract multi-level visual features; Multivariate Knowledge Progressive Optimization Strategy (MKPOS) achieves steady performance improvements through hierarchical augmentation and staged training.

Novelty:

The proposed ME-FKPN model introduces a novel solution to the TBPR task by integrating hierarchical knowledge enhancement and progressive optimization strategies. By deeply mining latent associations among homogeneous knowledge and fostering collaborative representation of heterogeneous knowledge, the approach effectively overcomes the limitations of existing techniques in handling complex cross-modal relationships and data sparsity.

Findings:

ME-FKPN outperforms all state-of-the-art TBPR models across three public datasets. Notably, on the challenging UFine6926 ultra-fine-grained dataset, our method achieves improvements of 16.49%, 9.79%, 6.23%, and 14.69% on R@1, R@5, R@10, and mAP metrics, respectively, compared to mainstream approaches.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Knowledge-Based Systems
Knowledge-Based Systems 工程技术-计算机:人工智能
CiteScore
14.80
自引率
12.50%
发文量
1245
审稿时长
7.8 months
期刊介绍: Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信