MINIGE-MNER: A multi-stage interaction network inspired by gene editing for multimodal named entity recognition

IF 6.3 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2025-09-12 DOI:10.1016/j.neunet.2025.108106

Bo Kong , Shengquan Liu , Liruizhi Jia , Yi Liang , Dongfang Han , Xu Zhang

{"title":"MINIGE-MNER: A multi-stage interaction network inspired by gene editing for multimodal named entity recognition","authors":"Bo Kong , Shengquan Liu , Liruizhi Jia , Yi Liang , Dongfang Han , Xu Zhang","doi":"10.1016/j.neunet.2025.108106","DOIUrl":null,"url":null,"abstract":"<div><div>Multimodal Named Entity Recognition (MNER) integrates complementary information from both text and images to identify named entities within text. However, existing methods face three key issues: imbalanced handling of modality noise, the cascading effect of semantic mismatch, and information loss resulting from the lack of text dominance. To address these issues, this paper proposes a <strong>M</strong>ulti-stage <strong>I</strong>nteraction <strong>N</strong>etwork <strong>I</strong>nspired by <strong>G</strong>ene <strong>E</strong>diting for <strong>MNER</strong>(MINIGE-MNER). The core innovations of this method include: A gene knockout module based on the variational information bottleneck, which removes inferior genes (modality noise) from the text, raw image, and generated image features. This approach retains the superior genes, achieving balanced filtering of modality noise. A determination of gene recombination sites module that maximizes the mutual information between superior genes across modalities, reducing the spatial distance between them and ensuring precise, fine-grained semantic alignment. This helps to prevent the cascading effect of semantic mismatch. A text-guided gene recombination module that implements a “text-dominant, vision-supplementary” cross-modal fusion paradigm. This module dynamically filters out visual noise unrelated to the text while avoiding excessive reliance on visual information that could obscure the unique contextual information of the text, effectively mitigating information loss. Experimental results show that MINIGE-MNER achieves F1 scores of 76.45 % and 88.67 % on the Twitter-2015 and Twitter-2017 datasets, respectively, outperforming existing state-of-the-art methods by 0.83 % and 0.42 %. In addition, this paper presents comprehensive experiments that demonstrate the superiority of MINIGE-MNER and the effectiveness of its individual modules.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"194 ","pages":"Article 108106"},"PeriodicalIF":6.3000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025009864","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Multimodal Named Entity Recognition (MNER) integrates complementary information from both text and images to identify named entities within text. However, existing methods face three key issues: imbalanced handling of modality noise, the cascading effect of semantic mismatch, and information loss resulting from the lack of text dominance. To address these issues, this paper proposes a Multi-stage Interaction Network Inspired by Gene Editing for MNER(MINIGE-MNER). The core innovations of this method include: A gene knockout module based on the variational information bottleneck, which removes inferior genes (modality noise) from the text, raw image, and generated image features. This approach retains the superior genes, achieving balanced filtering of modality noise. A determination of gene recombination sites module that maximizes the mutual information between superior genes across modalities, reducing the spatial distance between them and ensuring precise, fine-grained semantic alignment. This helps to prevent the cascading effect of semantic mismatch. A text-guided gene recombination module that implements a “text-dominant, vision-supplementary” cross-modal fusion paradigm. This module dynamically filters out visual noise unrelated to the text while avoiding excessive reliance on visual information that could obscure the unique contextual information of the text, effectively mitigating information loss. Experimental results show that MINIGE-MNER achieves F1 scores of 76.45 % and 88.67 % on the Twitter-2015 and Twitter-2017 datasets, respectively, outperforming existing state-of-the-art methods by 0.83 % and 0.42 %. In addition, this paper presents comprehensive experiments that demonstrate the superiority of MINIGE-MNER and the effectiveness of its individual modules.

查看原文本刊更多论文

minie - mner：一个受基因编辑启发的多阶段交互网络，用于多模态命名实体识别。

多模态命名实体识别（MNER）集成了文本和图像的互补信息来识别文本中的命名实体。然而，现有方法面临三个关键问题：模态噪声处理不平衡、语义不匹配的级联效应以及缺乏文本优势导致的信息丢失。为了解决这些问题，本文提出了一种基于基因编辑的MNER多阶段交互网络（mini -MNER）。该方法的核心创新包括：基于变分信息瓶颈的基因敲除模块，从文本、原始图像和生成的图像特征中去除劣质基因（模态噪声）。这种方法保留了优越的基因，实现了模态噪声的均衡过滤。基因重组位点的确定模块，最大限度地提高了跨模式的优势基因之间的相互信息，减少了它们之间的空间距离，并确保精确，细粒度的语义对齐。这有助于防止语义不匹配的级联效应。一个文本引导的基因重组模块，实现了“文本主导，视觉辅助”的跨模态融合范式。该模块动态滤除与文本无关的视觉噪声，避免过度依赖视觉信息掩盖文本特有的上下文信息，有效减轻信息丢失。实验结果表明，mini - mner在Twitter-2015和Twitter-2017数据集上的F1得分分别为76.45%和88.67%，比现有最先进的方法分别高出0.83%和0.42%。此外，本文还进行了综合实验，证明了mini - mner的优越性及其各个模块的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.