FRCL-MNER: A Finer Grained Rank-Based Contrastive Learning Framework for Multimodal NER

IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Tianwei Yan;Shan Zhao;Wentao Ma;Shezheng Song;Chengyu Wang;Zhibo Rao;Shizhao Chen;Zhigang Luo;Xinwang Liu
{"title":"FRCL-MNER: A Finer Grained Rank-Based Contrastive Learning Framework for Multimodal NER","authors":"Tianwei Yan;Shan Zhao;Wentao Ma;Shezheng Song;Chengyu Wang;Zhibo Rao;Shizhao Chen;Zhigang Luo;Xinwang Liu","doi":"10.1109/TNNLS.2025.3528567","DOIUrl":null,"url":null,"abstract":"Multimodal named entity recognition (MNER) is an emerging field that aims to automatically detect named entities and classify their categories, utilizing input text and auxiliary resources such as images. While previous studies have leveraged object detectors to preprocess images and fuse textual semantics with corresponding image features, these methods often overlook the potential finer grained information within each modality and may exacerbate error propagation due to predetection. To address these issues, we propose a finer grained rank-based contrastive learning (FRCL) framework for MNER. This framework employs a global-level contrastive learning to align multimodal semantic features and a Top-K rank-based mask strategy to construct positive-negative pairs, thereby learning a finer grained multimodal interaction representation. Experimental results from three well-known social media datasets reveal that our approach surpasses existing strong baselines, and achieves up to a 1.54% improvement on the Twitter2015 dataset. Extensive discussions further confirm the effectiveness of our approach. We will release the source code on <uri>https://github.com/augusyan/FRCL</uri>.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"36 6","pages":"10779-10793"},"PeriodicalIF":8.9000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10879144/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Multimodal named entity recognition (MNER) is an emerging field that aims to automatically detect named entities and classify their categories, utilizing input text and auxiliary resources such as images. While previous studies have leveraged object detectors to preprocess images and fuse textual semantics with corresponding image features, these methods often overlook the potential finer grained information within each modality and may exacerbate error propagation due to predetection. To address these issues, we propose a finer grained rank-based contrastive learning (FRCL) framework for MNER. This framework employs a global-level contrastive learning to align multimodal semantic features and a Top-K rank-based mask strategy to construct positive-negative pairs, thereby learning a finer grained multimodal interaction representation. Experimental results from three well-known social media datasets reveal that our approach surpasses existing strong baselines, and achieves up to a 1.54% improvement on the Twitter2015 dataset. Extensive discussions further confirm the effectiveness of our approach. We will release the source code on https://github.com/augusyan/FRCL.
FRCL-MNER:多模态NER的细粒度基于秩的对比学习框架
多模态命名实体识别(MNER)是一个新兴领域,其目的是利用输入文本和辅助资源(如图像)自动检测命名实体并对其分类。虽然以前的研究利用目标检测器对图像进行预处理,并将文本语义与相应的图像特征融合在一起,但这些方法往往忽略了每种模态中潜在的更细粒度信息,并且可能由于预检测而加剧错误传播。为了解决这些问题,我们为MNER提出了一个更细粒度的基于秩的对比学习(FRCL)框架。该框架采用全局级对比学习来对齐多模态语义特征,并采用基于Top-K秩的掩码策略来构建正负对,从而学习更细粒度的多模态交互表示。来自三个知名社交媒体数据集的实验结果表明,我们的方法超越了现有的强基线,在Twitter2015数据集上实现了1.54%的改进。广泛的讨论进一步证实了我们的做法的有效性。我们将在https://github.com/augusyan/FRCL上发布源代码。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE transactions on neural networks and learning systems
IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
CiteScore
23.80
自引率
9.60%
发文量
2102
审稿时长
3-8 weeks
期刊介绍: The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信