面向信息检索的知识对齐多模态转换器

IF 6.8 2区工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY

alexandria engineering journal Pub Date : 2025-07-12 DOI:10.1016/j.aej.2025.06.055

Xiaoqin Lin , Chentao Han , Jian Yao , Yue Li , Xujun Wang , Shufeng Jia

{"title":"面向信息检索的知识对齐多模态转换器","authors":"Xiaoqin Lin , Chentao Han , Jian Yao , Yue Li , Xujun Wang , Shufeng Jia","doi":"10.1016/j.aej.2025.06.055","DOIUrl":null,"url":null,"abstract":"<div><div>With the rapid advancement of artificial intelligence and the Internet of Things, data collected from multiple sensing modalities is growing rapidly in both volume and complexity. In this paper, we propose a novel deep learning framework called MKNNet, which combines modality alignment, Transformer-based fusion, and multi-loss optimization to construct a unified semantic embedding space for multimodal information retrieval. Our model leverages modality-specific encoders and attention-based fusion to achieve deep semantic consistency across modalities. Experimental results on MS-COCO and Flickr30K datasets demonstrate that MKNNet significantly outperforms state-of-the-art models such as CLIP and BLIP in terms of Recall and mAP. The proposed method enhances semantic alignment and retrieval accuracy, showing great potential for applications in smart cities, healthcare, and other multimodal Internet of Things scenarios.</div></div>","PeriodicalId":7484,"journal":{"name":"alexandria engineering journal","volume":"127 ","pages":"Pages 1029-1039"},"PeriodicalIF":6.8000,"publicationDate":"2025-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MKNNet: Knowledge-aligned multimodal transformer for information retrieval\",\"authors\":\"Xiaoqin Lin , Chentao Han , Jian Yao , Yue Li , Xujun Wang , Shufeng Jia\",\"doi\":\"10.1016/j.aej.2025.06.055\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the rapid advancement of artificial intelligence and the Internet of Things, data collected from multiple sensing modalities is growing rapidly in both volume and complexity. In this paper, we propose a novel deep learning framework called MKNNet, which combines modality alignment, Transformer-based fusion, and multi-loss optimization to construct a unified semantic embedding space for multimodal information retrieval. Our model leverages modality-specific encoders and attention-based fusion to achieve deep semantic consistency across modalities. Experimental results on MS-COCO and Flickr30K datasets demonstrate that MKNNet significantly outperforms state-of-the-art models such as CLIP and BLIP in terms of Recall and mAP. The proposed method enhances semantic alignment and retrieval accuracy, showing great potential for applications in smart cities, healthcare, and other multimodal Internet of Things scenarios.</div></div>\",\"PeriodicalId\":7484,\"journal\":{\"name\":\"alexandria engineering journal\",\"volume\":\"127 \",\"pages\":\"Pages 1029-1039\"},\"PeriodicalIF\":6.8000,\"publicationDate\":\"2025-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"alexandria engineering journal\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1110016825008051\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"alexandria engineering journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110016825008051","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

随着人工智能和物联网的快速发展，从多种传感方式收集的数据在数量和复杂性方面都在快速增长。本文提出了一种新的深度学习框架MKNNet，该框架将模态对齐、基于transformer的融合和多损失优化相结合，为多模态信息检索构建了统一的语义嵌入空间。我们的模型利用模态特定的编码器和基于注意力的融合来实现模态之间的深度语义一致性。在MS-COCO和Flickr30K数据集上的实验结果表明，MKNNet在Recall和mAP方面明显优于CLIP和BLIP等最先进的模型。该方法增强了语义对齐和检索精度，在智慧城市、医疗保健和其他多模态物联网场景中具有很大的应用潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MKNNet: Knowledge-aligned multimodal transformer for information retrieval

With the rapid advancement of artificial intelligence and the Internet of Things, data collected from multiple sensing modalities is growing rapidly in both volume and complexity. In this paper, we propose a novel deep learning framework called MKNNet, which combines modality alignment, Transformer-based fusion, and multi-loss optimization to construct a unified semantic embedding space for multimodal information retrieval. Our model leverages modality-specific encoders and attention-based fusion to achieve deep semantic consistency across modalities. Experimental results on MS-COCO and Flickr30K datasets demonstrate that MKNNet significantly outperforms state-of-the-art models such as CLIP and BLIP in terms of Recall and mAP. The proposed method enhances semantic alignment and retrieval accuracy, showing great potential for applications in smart cities, healthcare, and other multimodal Internet of Things scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

alexandria engineering journal Engineering-General Engineering

CiteScore

11.20

自引率

4.40%

发文量

1015

审稿时长

43 days

期刊介绍： Alexandria Engineering Journal is an international journal devoted to publishing high quality papers in the field of engineering and applied science. Alexandria Engineering Journal is cited in the Engineering Information Services (EIS) and the Chemical Abstracts (CA). The papers published in Alexandria Engineering Journal are grouped into five sections, according to the following classification: • Mechanical, Production, Marine and Textile Engineering • Electrical Engineering, Computer Science and Nuclear Engineering • Civil and Architecture Engineering • Chemical Engineering and Applied Sciences • Environmental Engineering