Bo Kong , Shengquan Liu , Liruizhi Jia , Yi Liang , Dongfang Han , Xu Zhang
{"title":"MINIGE-MNER: A multi-stage interaction network inspired by gene editing for multimodal named entity recognition","authors":"Bo Kong , Shengquan Liu , Liruizhi Jia , Yi Liang , Dongfang Han , Xu Zhang","doi":"10.1016/j.neunet.2025.108106","DOIUrl":null,"url":null,"abstract":"<div><div>Multimodal Named Entity Recognition (MNER) integrates complementary information from both text and images to identify named entities within text. However, existing methods face three key issues: imbalanced handling of modality noise, the cascading effect of semantic mismatch, and information loss resulting from the lack of text dominance. To address these issues, this paper proposes a <strong>M</strong>ulti-stage <strong>I</strong>nteraction <strong>N</strong>etwork <strong>I</strong>nspired by <strong>G</strong>ene <strong>E</strong>diting for <strong>MNER</strong>(MINIGE-MNER). The core innovations of this method include: A gene knockout module based on the variational information bottleneck, which removes inferior genes (modality noise) from the text, raw image, and generated image features. This approach retains the superior genes, achieving balanced filtering of modality noise. A determination of gene recombination sites module that maximizes the mutual information between superior genes across modalities, reducing the spatial distance between them and ensuring precise, fine-grained semantic alignment. This helps to prevent the cascading effect of semantic mismatch. A text-guided gene recombination module that implements a “text-dominant, vision-supplementary” cross-modal fusion paradigm. This module dynamically filters out visual noise unrelated to the text while avoiding excessive reliance on visual information that could obscure the unique contextual information of the text, effectively mitigating information loss. Experimental results show that MINIGE-MNER achieves F1 scores of 76.45 % and 88.67 % on the Twitter-2015 and Twitter-2017 datasets, respectively, outperforming existing state-of-the-art methods by 0.83 % and 0.42 %. In addition, this paper presents comprehensive experiments that demonstrate the superiority of MINIGE-MNER and the effectiveness of its individual modules.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"194 ","pages":"Article 108106"},"PeriodicalIF":6.3000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025009864","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Multimodal Named Entity Recognition (MNER) integrates complementary information from both text and images to identify named entities within text. However, existing methods face three key issues: imbalanced handling of modality noise, the cascading effect of semantic mismatch, and information loss resulting from the lack of text dominance. To address these issues, this paper proposes a Multi-stage Interaction Network Inspired by Gene Editing for MNER(MINIGE-MNER). The core innovations of this method include: A gene knockout module based on the variational information bottleneck, which removes inferior genes (modality noise) from the text, raw image, and generated image features. This approach retains the superior genes, achieving balanced filtering of modality noise. A determination of gene recombination sites module that maximizes the mutual information between superior genes across modalities, reducing the spatial distance between them and ensuring precise, fine-grained semantic alignment. This helps to prevent the cascading effect of semantic mismatch. A text-guided gene recombination module that implements a “text-dominant, vision-supplementary” cross-modal fusion paradigm. This module dynamically filters out visual noise unrelated to the text while avoiding excessive reliance on visual information that could obscure the unique contextual information of the text, effectively mitigating information loss. Experimental results show that MINIGE-MNER achieves F1 scores of 76.45 % and 88.67 % on the Twitter-2015 and Twitter-2017 datasets, respectively, outperforming existing state-of-the-art methods by 0.83 % and 0.42 %. In addition, this paper presents comprehensive experiments that demonstrate the superiority of MINIGE-MNER and the effectiveness of its individual modules.
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.