MgHiSal: MLLM-guided hierarchical semantic alignment for multimodal knowledge graph completion

IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Jie Chen , Wuyang Zhang , Shu Zhao , Yunxia Yin
{"title":"MgHiSal: MLLM-guided hierarchical semantic alignment for multimodal knowledge graph completion","authors":"Jie Chen ,&nbsp;Wuyang Zhang ,&nbsp;Shu Zhao ,&nbsp;Yunxia Yin","doi":"10.1016/j.knosys.2025.114552","DOIUrl":null,"url":null,"abstract":"<div><div>Multimodal Knowledge Graph Completion (MMKGC) aims to improve link prediction and empower downstream applications like intelligent question answering, reasoning, and recommendation by integrating multimodal information. However, existing methods commonly face semantic fragmentation between visual and textual modalities. This issue largely stems from the reliance on pre-trained vision models, which often struggle to model deep entity semantics, resulting in noisy visual features and the neglect of key semantic information. To address this, we propose MgHiSal, an MLLM-guided Hierarchical Semantic Alignment framework for MMKGC. MgHiSal first generates context-aware visual descriptions by conditioning MLLM generation on existing entity text, ensuring low-noise, relevant representations for initial semantic alignment. The framework then utilizes a hierarchical gated attention mechanism that progressively unifies multimodal representations by dynamically selecting and optimizing key cross-modal features via regularization. Finally, a neighbor-aware module enhances entity representations by aggregating multimodal neighbor information. Experiments on DB15K and MKG-W show MgHiSal significantly improves MRR by approximately 13.1 % and 12.5 % over respective runner-ups. The source code is publicly available at <span><span>https://github.com/wyZhang016/MgHiSal</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114552"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125015916","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Multimodal Knowledge Graph Completion (MMKGC) aims to improve link prediction and empower downstream applications like intelligent question answering, reasoning, and recommendation by integrating multimodal information. However, existing methods commonly face semantic fragmentation between visual and textual modalities. This issue largely stems from the reliance on pre-trained vision models, which often struggle to model deep entity semantics, resulting in noisy visual features and the neglect of key semantic information. To address this, we propose MgHiSal, an MLLM-guided Hierarchical Semantic Alignment framework for MMKGC. MgHiSal first generates context-aware visual descriptions by conditioning MLLM generation on existing entity text, ensuring low-noise, relevant representations for initial semantic alignment. The framework then utilizes a hierarchical gated attention mechanism that progressively unifies multimodal representations by dynamically selecting and optimizing key cross-modal features via regularization. Finally, a neighbor-aware module enhances entity representations by aggregating multimodal neighbor information. Experiments on DB15K and MKG-W show MgHiSal significantly improves MRR by approximately 13.1 % and 12.5 % over respective runner-ups. The source code is publicly available at https://github.com/wyZhang016/MgHiSal.
MgHiSal:基于多模态知识图补全的mlm引导分层语义对齐
多模态知识图谱完成(MMKGC)旨在通过集成多模态信息来改进链接预测和增强下游应用,如智能问答、推理和推荐。然而,现有的方法普遍面临着视觉和文本模式之间的语义碎片化问题。这一问题很大程度上源于对预训练视觉模型的依赖,这些模型往往难以对深度实体语义进行建模,从而导致视觉特征的噪声和关键语义信息的忽视。为了解决这个问题,我们提出了MgHiSal,一个mllm指导的MMKGC分层语义对齐框架。MgHiSal首先通过在现有实体文本上调节MLLM生成来生成上下文感知的视觉描述,确保初始语义对齐的低噪声、相关表示。然后,该框架利用分层门控注意机制,通过正则化动态选择和优化关键的跨模态特征,逐步统一多模态表示。最后,邻居感知模块通过聚合多模态邻居信息来增强实体表示。DB15K和MKG-W的实验表明,MgHiSal比各自的亚军显著提高了大约13.1%和12.5%的MRR。源代码可在https://github.com/wyZhang016/MgHiSal上公开获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Knowledge-Based Systems
Knowledge-Based Systems 工程技术-计算机:人工智能
CiteScore
14.80
自引率
12.50%
发文量
1245
审稿时长
7.8 months
期刊介绍: Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信