面向信息检索的知识对齐多模态转换器

IF 6.8 2区 工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY
Xiaoqin Lin , Chentao Han , Jian Yao , Yue Li , Xujun Wang , Shufeng Jia
{"title":"面向信息检索的知识对齐多模态转换器","authors":"Xiaoqin Lin ,&nbsp;Chentao Han ,&nbsp;Jian Yao ,&nbsp;Yue Li ,&nbsp;Xujun Wang ,&nbsp;Shufeng Jia","doi":"10.1016/j.aej.2025.06.055","DOIUrl":null,"url":null,"abstract":"<div><div>With the rapid advancement of artificial intelligence and the Internet of Things, data collected from multiple sensing modalities is growing rapidly in both volume and complexity. In this paper, we propose a novel deep learning framework called MKNNet, which combines modality alignment, Transformer-based fusion, and multi-loss optimization to construct a unified semantic embedding space for multimodal information retrieval. Our model leverages modality-specific encoders and attention-based fusion to achieve deep semantic consistency across modalities. Experimental results on MS-COCO and Flickr30K datasets demonstrate that MKNNet significantly outperforms state-of-the-art models such as CLIP and BLIP in terms of Recall and mAP. The proposed method enhances semantic alignment and retrieval accuracy, showing great potential for applications in smart cities, healthcare, and other multimodal Internet of Things scenarios.</div></div>","PeriodicalId":7484,"journal":{"name":"alexandria engineering journal","volume":"127 ","pages":"Pages 1029-1039"},"PeriodicalIF":6.8000,"publicationDate":"2025-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MKNNet: Knowledge-aligned multimodal transformer for information retrieval\",\"authors\":\"Xiaoqin Lin ,&nbsp;Chentao Han ,&nbsp;Jian Yao ,&nbsp;Yue Li ,&nbsp;Xujun Wang ,&nbsp;Shufeng Jia\",\"doi\":\"10.1016/j.aej.2025.06.055\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the rapid advancement of artificial intelligence and the Internet of Things, data collected from multiple sensing modalities is growing rapidly in both volume and complexity. In this paper, we propose a novel deep learning framework called MKNNet, which combines modality alignment, Transformer-based fusion, and multi-loss optimization to construct a unified semantic embedding space for multimodal information retrieval. Our model leverages modality-specific encoders and attention-based fusion to achieve deep semantic consistency across modalities. Experimental results on MS-COCO and Flickr30K datasets demonstrate that MKNNet significantly outperforms state-of-the-art models such as CLIP and BLIP in terms of Recall and mAP. The proposed method enhances semantic alignment and retrieval accuracy, showing great potential for applications in smart cities, healthcare, and other multimodal Internet of Things scenarios.</div></div>\",\"PeriodicalId\":7484,\"journal\":{\"name\":\"alexandria engineering journal\",\"volume\":\"127 \",\"pages\":\"Pages 1029-1039\"},\"PeriodicalIF\":6.8000,\"publicationDate\":\"2025-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"alexandria engineering journal\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1110016825008051\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"alexandria engineering journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110016825008051","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

随着人工智能和物联网的快速发展,从多种传感方式收集的数据在数量和复杂性方面都在快速增长。本文提出了一种新的深度学习框架MKNNet,该框架将模态对齐、基于transformer的融合和多损失优化相结合,为多模态信息检索构建了统一的语义嵌入空间。我们的模型利用模态特定的编码器和基于注意力的融合来实现模态之间的深度语义一致性。在MS-COCO和Flickr30K数据集上的实验结果表明,MKNNet在Recall和mAP方面明显优于CLIP和BLIP等最先进的模型。该方法增强了语义对齐和检索精度,在智慧城市、医疗保健和其他多模态物联网场景中具有很大的应用潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
MKNNet: Knowledge-aligned multimodal transformer for information retrieval
With the rapid advancement of artificial intelligence and the Internet of Things, data collected from multiple sensing modalities is growing rapidly in both volume and complexity. In this paper, we propose a novel deep learning framework called MKNNet, which combines modality alignment, Transformer-based fusion, and multi-loss optimization to construct a unified semantic embedding space for multimodal information retrieval. Our model leverages modality-specific encoders and attention-based fusion to achieve deep semantic consistency across modalities. Experimental results on MS-COCO and Flickr30K datasets demonstrate that MKNNet significantly outperforms state-of-the-art models such as CLIP and BLIP in terms of Recall and mAP. The proposed method enhances semantic alignment and retrieval accuracy, showing great potential for applications in smart cities, healthcare, and other multimodal Internet of Things scenarios.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
alexandria engineering journal
alexandria engineering journal Engineering-General Engineering
CiteScore
11.20
自引率
4.40%
发文量
1015
审稿时长
43 days
期刊介绍: Alexandria Engineering Journal is an international journal devoted to publishing high quality papers in the field of engineering and applied science. Alexandria Engineering Journal is cited in the Engineering Information Services (EIS) and the Chemical Abstracts (CA). The papers published in Alexandria Engineering Journal are grouped into five sections, according to the following classification: • Mechanical, Production, Marine and Textile Engineering • Electrical Engineering, Computer Science and Nuclear Engineering • Civil and Architecture Engineering • Chemical Engineering and Applied Sciences • Environmental Engineering
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信