MgHiSal: MLLM-guided hierarchical semantic alignment for multimodal knowledge graph completion

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2025-09-28 DOI:10.1016/j.knosys.2025.114552

Jie Chen , Wuyang Zhang , Shu Zhao , Yunxia Yin

{"title":"MgHiSal: MLLM-guided hierarchical semantic alignment for multimodal knowledge graph completion","authors":"Jie Chen , Wuyang Zhang , Shu Zhao , Yunxia Yin","doi":"10.1016/j.knosys.2025.114552","DOIUrl":null,"url":null,"abstract":"<div><div>Multimodal Knowledge Graph Completion (MMKGC) aims to improve link prediction and empower downstream applications like intelligent question answering, reasoning, and recommendation by integrating multimodal information. However, existing methods commonly face semantic fragmentation between visual and textual modalities. This issue largely stems from the reliance on pre-trained vision models, which often struggle to model deep entity semantics, resulting in noisy visual features and the neglect of key semantic information. To address this, we propose MgHiSal, an MLLM-guided Hierarchical Semantic Alignment framework for MMKGC. MgHiSal first generates context-aware visual descriptions by conditioning MLLM generation on existing entity text, ensuring low-noise, relevant representations for initial semantic alignment. The framework then utilizes a hierarchical gated attention mechanism that progressively unifies multimodal representations by dynamically selecting and optimizing key cross-modal features via regularization. Finally, a neighbor-aware module enhances entity representations by aggregating multimodal neighbor information. Experiments on DB15K and MKG-W show MgHiSal significantly improves MRR by approximately 13.1 % and 12.5 % over respective runner-ups. The source code is publicly available at <span><span>https://github.com/wyZhang016/MgHiSal</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114552"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125015916","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Multimodal Knowledge Graph Completion (MMKGC) aims to improve link prediction and empower downstream applications like intelligent question answering, reasoning, and recommendation by integrating multimodal information. However, existing methods commonly face semantic fragmentation between visual and textual modalities. This issue largely stems from the reliance on pre-trained vision models, which often struggle to model deep entity semantics, resulting in noisy visual features and the neglect of key semantic information. To address this, we propose MgHiSal, an MLLM-guided Hierarchical Semantic Alignment framework for MMKGC. MgHiSal first generates context-aware visual descriptions by conditioning MLLM generation on existing entity text, ensuring low-noise, relevant representations for initial semantic alignment. The framework then utilizes a hierarchical gated attention mechanism that progressively unifies multimodal representations by dynamically selecting and optimizing key cross-modal features via regularization. Finally, a neighbor-aware module enhances entity representations by aggregating multimodal neighbor information. Experiments on DB15K and MKG-W show MgHiSal significantly improves MRR by approximately 13.1 % and 12.5 % over respective runner-ups. The source code is publicly available at https://github.com/wyZhang016/MgHiSal.

查看原文本刊更多论文

MgHiSal：基于多模态知识图补全的mlm引导分层语义对齐

多模态知识图谱完成（MMKGC）旨在通过集成多模态信息来改进链接预测和增强下游应用，如智能问答、推理和推荐。然而，现有的方法普遍面临着视觉和文本模式之间的语义碎片化问题。这一问题很大程度上源于对预训练视觉模型的依赖，这些模型往往难以对深度实体语义进行建模，从而导致视觉特征的噪声和关键语义信息的忽视。为了解决这个问题，我们提出了MgHiSal，一个mllm指导的MMKGC分层语义对齐框架。MgHiSal首先通过在现有实体文本上调节MLLM生成来生成上下文感知的视觉描述，确保初始语义对齐的低噪声、相关表示。然后，该框架利用分层门控注意机制，通过正则化动态选择和优化关键的跨模态特征，逐步统一多模态表示。最后，邻居感知模块通过聚合多模态邻居信息来增强实体表示。DB15K和MKG-W的实验表明，MgHiSal比各自的亚军显著提高了大约13.1%和12.5%的MRR。源代码可在https://github.com/wyZhang016/MgHiSal上公开获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.