{"title":"MgHiSal:基于多模态知识图补全的mlm引导分层语义对齐","authors":"Jie Chen , Wuyang Zhang , Shu Zhao , Yunxia Yin","doi":"10.1016/j.knosys.2025.114552","DOIUrl":null,"url":null,"abstract":"<div><div>Multimodal Knowledge Graph Completion (MMKGC) aims to improve link prediction and empower downstream applications like intelligent question answering, reasoning, and recommendation by integrating multimodal information. However, existing methods commonly face semantic fragmentation between visual and textual modalities. This issue largely stems from the reliance on pre-trained vision models, which often struggle to model deep entity semantics, resulting in noisy visual features and the neglect of key semantic information. To address this, we propose MgHiSal, an MLLM-guided Hierarchical Semantic Alignment framework for MMKGC. MgHiSal first generates context-aware visual descriptions by conditioning MLLM generation on existing entity text, ensuring low-noise, relevant representations for initial semantic alignment. The framework then utilizes a hierarchical gated attention mechanism that progressively unifies multimodal representations by dynamically selecting and optimizing key cross-modal features via regularization. Finally, a neighbor-aware module enhances entity representations by aggregating multimodal neighbor information. Experiments on DB15K and MKG-W show MgHiSal significantly improves MRR by approximately 13.1 % and 12.5 % over respective runner-ups. The source code is publicly available at <span><span>https://github.com/wyZhang016/MgHiSal</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114552"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MgHiSal: MLLM-guided hierarchical semantic alignment for multimodal knowledge graph completion\",\"authors\":\"Jie Chen , Wuyang Zhang , Shu Zhao , Yunxia Yin\",\"doi\":\"10.1016/j.knosys.2025.114552\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multimodal Knowledge Graph Completion (MMKGC) aims to improve link prediction and empower downstream applications like intelligent question answering, reasoning, and recommendation by integrating multimodal information. However, existing methods commonly face semantic fragmentation between visual and textual modalities. This issue largely stems from the reliance on pre-trained vision models, which often struggle to model deep entity semantics, resulting in noisy visual features and the neglect of key semantic information. To address this, we propose MgHiSal, an MLLM-guided Hierarchical Semantic Alignment framework for MMKGC. MgHiSal first generates context-aware visual descriptions by conditioning MLLM generation on existing entity text, ensuring low-noise, relevant representations for initial semantic alignment. The framework then utilizes a hierarchical gated attention mechanism that progressively unifies multimodal representations by dynamically selecting and optimizing key cross-modal features via regularization. Finally, a neighbor-aware module enhances entity representations by aggregating multimodal neighbor information. Experiments on DB15K and MKG-W show MgHiSal significantly improves MRR by approximately 13.1 % and 12.5 % over respective runner-ups. The source code is publicly available at <span><span>https://github.com/wyZhang016/MgHiSal</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"330 \",\"pages\":\"Article 114552\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-09-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705125015916\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125015916","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
MgHiSal: MLLM-guided hierarchical semantic alignment for multimodal knowledge graph completion
Multimodal Knowledge Graph Completion (MMKGC) aims to improve link prediction and empower downstream applications like intelligent question answering, reasoning, and recommendation by integrating multimodal information. However, existing methods commonly face semantic fragmentation between visual and textual modalities. This issue largely stems from the reliance on pre-trained vision models, which often struggle to model deep entity semantics, resulting in noisy visual features and the neglect of key semantic information. To address this, we propose MgHiSal, an MLLM-guided Hierarchical Semantic Alignment framework for MMKGC. MgHiSal first generates context-aware visual descriptions by conditioning MLLM generation on existing entity text, ensuring low-noise, relevant representations for initial semantic alignment. The framework then utilizes a hierarchical gated attention mechanism that progressively unifies multimodal representations by dynamically selecting and optimizing key cross-modal features via regularization. Finally, a neighbor-aware module enhances entity representations by aggregating multimodal neighbor information. Experiments on DB15K and MKG-W show MgHiSal significantly improves MRR by approximately 13.1 % and 12.5 % over respective runner-ups. The source code is publicly available at https://github.com/wyZhang016/MgHiSal.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.