MCIRP: A multi-granularity cross-modal interaction model based on relational propagation for Multimodal Named Entity Recognition with multiple images

IF 6.9 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Yongheng Mu , Ziyu Guo , Xuewei Li , Lixu Shao , Shijun Liu , Feng Li , Guangxu Mei
{"title":"MCIRP: A multi-granularity cross-modal interaction model based on relational propagation for Multimodal Named Entity Recognition with multiple images","authors":"Yongheng Mu ,&nbsp;Ziyu Guo ,&nbsp;Xuewei Li ,&nbsp;Lixu Shao ,&nbsp;Shijun Liu ,&nbsp;Feng Li ,&nbsp;Guangxu Mei","doi":"10.1016/j.ipm.2025.104384","DOIUrl":null,"url":null,"abstract":"<div><div>Most existing Multimodal Named Entity Recognition (MNER) methods typically focus on processing textual content with a single image and fail to effectively handle content with multiple images. Therefore, MNER with multiple images presents significant research potential. However, current approaches for this task face two key limitations: (1) Treating all images equally without assessing their relevance to the text, which may introduce visual noise from unrelated images; (2) Relying solely on coarse-grained image features while disregarding fine-grained alignments between text and each image. To address the above limitations, this work introduces a novel <u>M</u>ulti-granularity <u>C</u>ross-modal <u>I</u>nteraction Model based on <u>R</u>elational <u>P</u>ropagation (MCIRP), which effectively leverages information from multiple images. For the first limitation, we propose a text–image relation propagation strategy that calculates the correlation score between the text and each image, enabling selective utilization of relevant image information. For the second limitation, we propose a multi-granularity cross-modal interaction fusion technique to facilitate the fusion of text and visual features at different levels of granularity. To the best of our knowledge, this is the first study to explore text–image relation propagation for the MNER task with multiple images. The results show that MCIRP improves the F1 scores on two MNER public datasets with multiple images (MNER-MI and MNER-MI-Plus) by 3.65% and 0.56%, respectively, achieving SOTA performance among existing multi-image methods.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104384"},"PeriodicalIF":6.9000,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325003255","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Most existing Multimodal Named Entity Recognition (MNER) methods typically focus on processing textual content with a single image and fail to effectively handle content with multiple images. Therefore, MNER with multiple images presents significant research potential. However, current approaches for this task face two key limitations: (1) Treating all images equally without assessing their relevance to the text, which may introduce visual noise from unrelated images; (2) Relying solely on coarse-grained image features while disregarding fine-grained alignments between text and each image. To address the above limitations, this work introduces a novel Multi-granularity Cross-modal Interaction Model based on Relational Propagation (MCIRP), which effectively leverages information from multiple images. For the first limitation, we propose a text–image relation propagation strategy that calculates the correlation score between the text and each image, enabling selective utilization of relevant image information. For the second limitation, we propose a multi-granularity cross-modal interaction fusion technique to facilitate the fusion of text and visual features at different levels of granularity. To the best of our knowledge, this is the first study to explore text–image relation propagation for the MNER task with multiple images. The results show that MCIRP improves the F1 scores on two MNER public datasets with multiple images (MNER-MI and MNER-MI-Plus) by 3.65% and 0.56%, respectively, achieving SOTA performance among existing multi-image methods.
MCIRP:多图像多模态命名实体识别中基于关系传播的多粒度跨模态交互模型
现有的多模态命名实体识别(MNER)方法大多侧重于处理单幅图像的文本内容,而不能有效地处理多幅图像的内容。因此,多图像的MNER具有重要的研究潜力。然而,目前用于这项任务的方法面临两个关键限制:(1)平等地对待所有图像,而不评估它们与文本的相关性,这可能会引入来自不相关图像的视觉噪声;(2)单纯依赖粗粒度的图像特征,忽略了文本与每张图像之间的细粒度对齐。为了解决上述限制,本工作引入了一种基于关系传播的新型多粒度跨模态交互模型(MCIRP),该模型有效地利用了来自多个图像的信息。对于第一个限制,我们提出了一种文本-图像关系传播策略,该策略计算文本和每个图像之间的相关性评分,从而能够选择性地利用相关图像信息。对于第二个限制,我们提出了一种多粒度跨模态交互融合技术,以促进不同粒度级别的文本和视觉特征的融合。据我们所知,这是第一个探索具有多图像的MNER任务的文本-图像关系传播的研究。结果表明,MCIRP在两个多图像MNER公共数据集(MNER- mi和MNER- mi - plus)上的F1分数分别提高了3.65%和0.56%,在现有的多图像方法中达到了SOTA的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Information Processing & Management
Information Processing & Management 工程技术-计算机:信息系统
CiteScore
17.00
自引率
11.60%
发文量
276
审稿时长
39 days
期刊介绍: Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信