Xinyu Liu , Guanglu Sun , Jing Jin , Fei Lang , Suxia Zhu
{"title":"联合多模态实体关系抽取的上下文驱动隐式三重推理","authors":"Xinyu Liu , Guanglu Sun , Jing Jin , Fei Lang , Suxia Zhu","doi":"10.1016/j.ipm.2025.104388","DOIUrl":null,"url":null,"abstract":"<div><div>Joint Multimodal Entity-Relation Extraction (JMERE) jointly models Multimodal Named Entity Recognition (MNER) and Multimodal Relation Extraction (MRE), aiming to extract valuable structured information from multimodal input. However, existing JMERE methods struggle to fully leverage bidirectional semantic interactions between tasks. To this end, this paper proposes a context-driven implicit triple reasoning framework (CITR), which takes type triples composed of entity and relation types as the foundation. Specifically, CITR first uses context generated by large multimodal models (LMMs) as semantic guidance cues to enhance modality representations, and prevent excessive semantic bias through a constraint module. Subsequently, CITR models the complex dependencies of different type triples to iteratively refine the representations associated with implicit triples. Finally, this paper reformulates the JMERE task as a type triple-centric sequence labeling problem and designs a dual-sequence joint tagging scheme, which reduces the computational complexity and label sparsity compared to previous schemes. Experimental results show that CITR achieves F1 score of 58.02% on the JMERE (Joint) task, significantly outperforming the state-of-the-art methods by 0.99%. Compared to methods with LMMs, CITR using LLaVA-1.5 achieves a superior F1 score of 58.49%.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104388"},"PeriodicalIF":6.9000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CITR: Context-driven implicit triple reasoning for joint multimodal entity-relation extraction\",\"authors\":\"Xinyu Liu , Guanglu Sun , Jing Jin , Fei Lang , Suxia Zhu\",\"doi\":\"10.1016/j.ipm.2025.104388\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Joint Multimodal Entity-Relation Extraction (JMERE) jointly models Multimodal Named Entity Recognition (MNER) and Multimodal Relation Extraction (MRE), aiming to extract valuable structured information from multimodal input. However, existing JMERE methods struggle to fully leverage bidirectional semantic interactions between tasks. To this end, this paper proposes a context-driven implicit triple reasoning framework (CITR), which takes type triples composed of entity and relation types as the foundation. Specifically, CITR first uses context generated by large multimodal models (LMMs) as semantic guidance cues to enhance modality representations, and prevent excessive semantic bias through a constraint module. Subsequently, CITR models the complex dependencies of different type triples to iteratively refine the representations associated with implicit triples. Finally, this paper reformulates the JMERE task as a type triple-centric sequence labeling problem and designs a dual-sequence joint tagging scheme, which reduces the computational complexity and label sparsity compared to previous schemes. Experimental results show that CITR achieves F1 score of 58.02% on the JMERE (Joint) task, significantly outperforming the state-of-the-art methods by 0.99%. Compared to methods with LMMs, CITR using LLaVA-1.5 achieves a superior F1 score of 58.49%.</div></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":\"63 2\",\"pages\":\"Article 104388\"},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2025-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457325003292\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325003292","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
CITR: Context-driven implicit triple reasoning for joint multimodal entity-relation extraction
Joint Multimodal Entity-Relation Extraction (JMERE) jointly models Multimodal Named Entity Recognition (MNER) and Multimodal Relation Extraction (MRE), aiming to extract valuable structured information from multimodal input. However, existing JMERE methods struggle to fully leverage bidirectional semantic interactions between tasks. To this end, this paper proposes a context-driven implicit triple reasoning framework (CITR), which takes type triples composed of entity and relation types as the foundation. Specifically, CITR first uses context generated by large multimodal models (LMMs) as semantic guidance cues to enhance modality representations, and prevent excessive semantic bias through a constraint module. Subsequently, CITR models the complex dependencies of different type triples to iteratively refine the representations associated with implicit triples. Finally, this paper reformulates the JMERE task as a type triple-centric sequence labeling problem and designs a dual-sequence joint tagging scheme, which reduces the computational complexity and label sparsity compared to previous schemes. Experimental results show that CITR achieves F1 score of 58.02% on the JMERE (Joint) task, significantly outperforming the state-of-the-art methods by 0.99%. Compared to methods with LMMs, CITR using LLaVA-1.5 achieves a superior F1 score of 58.49%.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.