Person Text-Image Matching via Text-Feature Interpretability Embedding and External Attack Node Implantation

IF 5.3 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Emerging Topics in Computational Intelligence Pub Date : 2024-10-01 DOI:10.1109/TETCI.2024.3462817

Fan Li;Hang Zhou;Huafeng Li;Yafei Zhang;Zhengtao Yu

{"title":"Person Text-Image Matching via Text-Feature Interpretability Embedding and External Attack Node Implantation","authors":"Fan Li;Hang Zhou;Huafeng Li;Yafei Zhang;Zhengtao Yu","doi":"10.1109/TETCI.2024.3462817","DOIUrl":null,"url":null,"abstract":"Person text-image matching, also known as text-based person search, aims to retrieve images of specific pedestrians using text descriptions. Although person text-image matching has made great research progress, existing methods still face two challenges. First, the lack of interpretability of text features makes it challenging to effectively align them with their corresponding image features. Second, the same pedestrian image often corresponds to multiple different text descriptions, and a single text description can correspond to multiple different images of the same identity. The diversity of text descriptions and images makes it difficult for a network to extract robust features that match the two modalities. To address these problems, we propose a person text-image matching method by embedding text-feature interpretability and an external attack node. Specifically, we improve the interpretability of text features by providing them with consistent semantic information with image features to achieve the alignment of text and describe image region features. To address the challenges posed by the diversity of text and the corresponding person images, we treat the variation caused by diversity to features as caused by perturbation information and propose a novel adversarial attack and defense method to solve it. In the model design, graph convolution is used as the basic framework for feature representation and the adversarial attacks caused by text and image diversity on feature extraction is simulated by implanting an additional attack node in the graph convolution layer to improve the robustness of the model against text and image diversity. Extensive experiments demonstrate the effectiveness and superiority of text-pedestrian image matching over existing methods.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 2","pages":"1202-1215"},"PeriodicalIF":5.3000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10701572/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Person text-image matching, also known as text-based person search, aims to retrieve images of specific pedestrians using text descriptions. Although person text-image matching has made great research progress, existing methods still face two challenges. First, the lack of interpretability of text features makes it challenging to effectively align them with their corresponding image features. Second, the same pedestrian image often corresponds to multiple different text descriptions, and a single text description can correspond to multiple different images of the same identity. The diversity of text descriptions and images makes it difficult for a network to extract robust features that match the two modalities. To address these problems, we propose a person text-image matching method by embedding text-feature interpretability and an external attack node. Specifically, we improve the interpretability of text features by providing them with consistent semantic information with image features to achieve the alignment of text and describe image region features. To address the challenges posed by the diversity of text and the corresponding person images, we treat the variation caused by diversity to features as caused by perturbation information and propose a novel adversarial attack and defense method to solve it. In the model design, graph convolution is used as the basic framework for feature representation and the adversarial attacks caused by text and image diversity on feature extraction is simulated by implanting an additional attack node in the graph convolution layer to improve the robustness of the model against text and image diversity. Extensive experiments demonstrate the effectiveness and superiority of text-pedestrian image matching over existing methods.

查看原文本刊更多论文

基于文本特征可解释性嵌入和外部攻击节点植入的文本-图像匹配

人文本-图像匹配，也称为基于文本的人搜索，旨在使用文本描述检索特定行人的图像。虽然人文本图像匹配的研究取得了很大的进展，但现有的方法仍然面临着两个挑战。首先，文本特征缺乏可解释性，很难有效地将其与相应的图像特征对齐。其次，同一幅行人图像往往对应多个不同的文字描述，而一个文字描述可以对应多个相同身份的不同图像。文本描述和图像的多样性使得网络很难提取匹配这两种模式的鲁棒特征。为了解决这些问题，我们提出了一种通过嵌入文本特征可解释性和外部攻击节点的人-文本-图像匹配方法。具体而言，我们通过为文本特征提供与图像特征一致的语义信息来提高文本特征的可解释性，从而实现文本对齐和描述图像区域特征。为了解决文本和相应人物图像多样性带来的挑战，我们将特征多样性引起的变化视为微扰信息引起的变化，并提出了一种新的对抗性攻击和防御方法来解决它。在模型设计中，以图卷积作为特征表示的基本框架，通过在图卷积层中植入额外的攻击节点，模拟文本和图像多样性对特征提取造成的对抗性攻击，提高模型对文本和图像多样性的鲁棒性。大量的实验证明了文本行人图像匹配方法的有效性和优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Emerging Topics in Computational Intelligence Mathematics-Control and Optimization

CiteScore

10.30

自引率

7.50%

发文量

147

期刊介绍： The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys. TETCI is an electronics only publication. TETCI publishes six issues per year. Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.