{"title":"渐进式特征挖掘与外部知识辅助文本-行人图像检索","authors":"Huafeng Li;Shedan Yang;Yafei Zhang;Dapeng Tao;Zhengtao Yu","doi":"10.1109/TMM.2024.3521812","DOIUrl":null,"url":null,"abstract":"Text-Pedestrian Image Retrieval employs textual description of pedestrian's appearance to identify the corresponding pedestrian image. This task involves modality discrepancy and the challenges posed by textual diversity of pedestrians with the same identity. Although advancements have been made in text-pedestrian image retrieval, current methods do not comprehensively address these challenges. Thus, this paper proposes a progressive feature mining and external knowledge- assisted feature purification method. Specifically, we implement a progressive mining mode, enabling the model to extract discriminative features from overlooked information. This enhances the model's feature representation capabilities and prevents the loss of discriminative information. To further mitigate the challenges posed by modality discrepancy and text diversity in cross-modal matching, we propose to use external knowledge of other samples from the same modality. This approach accentuates identity-consistent features and diminishes identity-inconsistent ones, refining feature representation and reducing interference from textual diversity and negative sample correlation features of the same modality. Extensive experiments on three challenging datasets demonstrate the effectiveness and superiority of the proposed method, with its retrieval performance outstripping that of large-scale model-based methods on large-scale datasets.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"1973-1987"},"PeriodicalIF":8.4000,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Progressive Feature Mining and External Knowledge-Assisted Text-Pedestrian Image Retrieval\",\"authors\":\"Huafeng Li;Shedan Yang;Yafei Zhang;Dapeng Tao;Zhengtao Yu\",\"doi\":\"10.1109/TMM.2024.3521812\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text-Pedestrian Image Retrieval employs textual description of pedestrian's appearance to identify the corresponding pedestrian image. This task involves modality discrepancy and the challenges posed by textual diversity of pedestrians with the same identity. Although advancements have been made in text-pedestrian image retrieval, current methods do not comprehensively address these challenges. Thus, this paper proposes a progressive feature mining and external knowledge- assisted feature purification method. Specifically, we implement a progressive mining mode, enabling the model to extract discriminative features from overlooked information. This enhances the model's feature representation capabilities and prevents the loss of discriminative information. To further mitigate the challenges posed by modality discrepancy and text diversity in cross-modal matching, we propose to use external knowledge of other samples from the same modality. This approach accentuates identity-consistent features and diminishes identity-inconsistent ones, refining feature representation and reducing interference from textual diversity and negative sample correlation features of the same modality. Extensive experiments on three challenging datasets demonstrate the effectiveness and superiority of the proposed method, with its retrieval performance outstripping that of large-scale model-based methods on large-scale datasets.\",\"PeriodicalId\":13273,\"journal\":{\"name\":\"IEEE Transactions on Multimedia\",\"volume\":\"27 \",\"pages\":\"1973-1987\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2024-12-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multimedia\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10814664/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10814664/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Progressive Feature Mining and External Knowledge-Assisted Text-Pedestrian Image Retrieval
Text-Pedestrian Image Retrieval employs textual description of pedestrian's appearance to identify the corresponding pedestrian image. This task involves modality discrepancy and the challenges posed by textual diversity of pedestrians with the same identity. Although advancements have been made in text-pedestrian image retrieval, current methods do not comprehensively address these challenges. Thus, this paper proposes a progressive feature mining and external knowledge- assisted feature purification method. Specifically, we implement a progressive mining mode, enabling the model to extract discriminative features from overlooked information. This enhances the model's feature representation capabilities and prevents the loss of discriminative information. To further mitigate the challenges posed by modality discrepancy and text diversity in cross-modal matching, we propose to use external knowledge of other samples from the same modality. This approach accentuates identity-consistent features and diminishes identity-inconsistent ones, refining feature representation and reducing interference from textual diversity and negative sample correlation features of the same modality. Extensive experiments on three challenging datasets demonstrate the effectiveness and superiority of the proposed method, with its retrieval performance outstripping that of large-scale model-based methods on large-scale datasets.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.