FRCL-MNER: A Finer Grained Rank-Based Contrastive Learning Framework for Multimodal NER

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems Pub Date : 2025-02-10 DOI:10.1109/TNNLS.2025.3528567

Tianwei Yan;Shan Zhao;Wentao Ma;Shezheng Song;Chengyu Wang;Zhibo Rao;Shizhao Chen;Zhigang Luo;Xinwang Liu

{"title":"FRCL-MNER: A Finer Grained Rank-Based Contrastive Learning Framework for Multimodal NER","authors":"Tianwei Yan;Shan Zhao;Wentao Ma;Shezheng Song;Chengyu Wang;Zhibo Rao;Shizhao Chen;Zhigang Luo;Xinwang Liu","doi":"10.1109/TNNLS.2025.3528567","DOIUrl":null,"url":null,"abstract":"Multimodal named entity recognition (MNER) is an emerging field that aims to automatically detect named entities and classify their categories, utilizing input text and auxiliary resources such as images. While previous studies have leveraged object detectors to preprocess images and fuse textual semantics with corresponding image features, these methods often overlook the potential finer grained information within each modality and may exacerbate error propagation due to predetection. To address these issues, we propose a finer grained rank-based contrastive learning (FRCL) framework for MNER. This framework employs a global-level contrastive learning to align multimodal semantic features and a Top-K rank-based mask strategy to construct positive-negative pairs, thereby learning a finer grained multimodal interaction representation. Experimental results from three well-known social media datasets reveal that our approach surpasses existing strong baselines, and achieves up to a 1.54% improvement on the Twitter2015 dataset. Extensive discussions further confirm the effectiveness of our approach. We will release the source code on <uri>https://github.com/augusyan/FRCL</uri>.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"36 6","pages":"10779-10793"},"PeriodicalIF":8.9000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10879144/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Multimodal named entity recognition (MNER) is an emerging field that aims to automatically detect named entities and classify their categories, utilizing input text and auxiliary resources such as images. While previous studies have leveraged object detectors to preprocess images and fuse textual semantics with corresponding image features, these methods often overlook the potential finer grained information within each modality and may exacerbate error propagation due to predetection. To address these issues, we propose a finer grained rank-based contrastive learning (FRCL) framework for MNER. This framework employs a global-level contrastive learning to align multimodal semantic features and a Top-K rank-based mask strategy to construct positive-negative pairs, thereby learning a finer grained multimodal interaction representation. Experimental results from three well-known social media datasets reveal that our approach surpasses existing strong baselines, and achieves up to a 1.54% improvement on the Twitter2015 dataset. Extensive discussions further confirm the effectiveness of our approach. We will release the source code on https://github.com/augusyan/FRCL.

查看原文本刊更多论文

FRCL-MNER：多模态NER的细粒度基于秩的对比学习框架

多模态命名实体识别（MNER）是一个新兴领域，其目的是利用输入文本和辅助资源（如图像）自动检测命名实体并对其分类。虽然以前的研究利用目标检测器对图像进行预处理，并将文本语义与相应的图像特征融合在一起，但这些方法往往忽略了每种模态中潜在的更细粒度信息，并且可能由于预检测而加剧错误传播。为了解决这些问题，我们为MNER提出了一个更细粒度的基于秩的对比学习（FRCL）框架。该框架采用全局级对比学习来对齐多模态语义特征，并采用基于Top-K秩的掩码策略来构建正负对，从而学习更细粒度的多模态交互表示。来自三个知名社交媒体数据集的实验结果表明，我们的方法超越了现有的强基线，在Twitter2015数据集上实现了1.54%的改进。广泛的讨论进一步证实了我们的做法的有效性。我们将在https://github.com/augusyan/FRCL上发布源代码。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

CiteScore

23.80

自引率

9.60%

发文量

2102

审稿时长

3-8 weeks

期刊介绍： The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.