CRISP: A cross-modal integration framework based on the surprisingly popular algorithm for multimodal named entity recognition

IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Haitao Liu , Xianwei Xin , Jihua Song , Weiming Peng
{"title":"CRISP: A cross-modal integration framework based on the surprisingly popular algorithm for multimodal named entity recognition","authors":"Haitao Liu ,&nbsp;Xianwei Xin ,&nbsp;Jihua Song ,&nbsp;Weiming Peng","doi":"10.1016/j.neucom.2024.128792","DOIUrl":null,"url":null,"abstract":"<div><div>The multimodal named entity recognition task on social media involves recognizing named entities with textual and visual information, which is of great significance for information processing. Nevertheless, many existing models still face the following challenges. First, in the process of cross-modal interaction, the attention mechanism sometimes focuses on trivial parts in the images that are not relevant to entities, which not only neglects valuable information but also inevitably introduces visual noise. Second, the gate mechanism is widely used for filtering out visual information to reduce the influence of noise on text understanding. However, the gate mechanism neglects capturing fine-grained semantic relevance between modalities, which easily affects the filtration process. To address these issues, we propose a cross-modal integration framework based on the surprisingly popular algorithm, aiming at enhancing the integration of effective visual guidance and reducing the interference of irrelevant visual noise. Specifically, we design a dual-branch interaction module that includes the attention mechanism and the surprisingly popular algorithm, allowing the model to focus on valuable but overlooked parts in the images. Furthermore, we compute the matching degree between modalities at the multi-granularity level, using the Choquet integral to establish a more reasonable basis for filtering out visual noise. We have conducted extensive experiments on public datasets, and the experimental results demonstrate the advantages of our model.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128792"},"PeriodicalIF":5.5000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224015637","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The multimodal named entity recognition task on social media involves recognizing named entities with textual and visual information, which is of great significance for information processing. Nevertheless, many existing models still face the following challenges. First, in the process of cross-modal interaction, the attention mechanism sometimes focuses on trivial parts in the images that are not relevant to entities, which not only neglects valuable information but also inevitably introduces visual noise. Second, the gate mechanism is widely used for filtering out visual information to reduce the influence of noise on text understanding. However, the gate mechanism neglects capturing fine-grained semantic relevance between modalities, which easily affects the filtration process. To address these issues, we propose a cross-modal integration framework based on the surprisingly popular algorithm, aiming at enhancing the integration of effective visual guidance and reducing the interference of irrelevant visual noise. Specifically, we design a dual-branch interaction module that includes the attention mechanism and the surprisingly popular algorithm, allowing the model to focus on valuable but overlooked parts in the images. Furthermore, we compute the matching degree between modalities at the multi-granularity level, using the Choquet integral to establish a more reasonable basis for filtering out visual noise. We have conducted extensive experiments on public datasets, and the experimental results demonstrate the advantages of our model.
CRISP:基于多模态命名实体识别惊人流行算法的跨模态整合框架
社交媒体上的多模态命名实体识别任务涉及识别带有文本和视觉信息的命名实体,这对信息处理具有重要意义。然而,许多现有模型仍面临以下挑战。首先,在跨模态交互过程中,注意力机制有时会关注图像中与实体无关的琐碎部分,这不仅会忽略有价值的信息,还不可避免地会引入视觉噪声。其次,门机制被广泛用于过滤视觉信息,以减少噪声对文本理解的影响。然而,门机制忽略了捕捉模态之间细粒度的语义相关性,这很容易影响过滤过程。为了解决这些问题,我们提出了一种基于惊人算法的跨模态整合框架,旨在加强有效视觉引导的整合,减少无关视觉噪声的干扰。具体来说,我们设计了一个双分支交互模块,其中包括注意力机制和令人惊讶的流行算法,使模型能够关注图像中有价值但被忽视的部分。此外,我们在多粒度水平上计算模态之间的匹配度,利用乔奎特积分为过滤视觉噪声建立更合理的基础。我们在公共数据集上进行了大量实验,实验结果证明了我们模型的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信