知识驱动的语义图像恢复与视觉语言扩散对齐

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2025-09-12 DOI:10.1016/j.knosys.2025.114464

Shengliang Wu , Jun Jiang , Xin He , Yong Xu , Yujun Zhu , Weiwei Jiang , Heju Li

{"title":"知识驱动的语义图像恢复与视觉语言扩散对齐","authors":"Shengliang Wu , Jun Jiang , Xin He , Yong Xu , Yujun Zhu , Weiwei Jiang , Heju Li","doi":"10.1016/j.knosys.2025.114464","DOIUrl":null,"url":null,"abstract":"<div><div>Semantic communication is an emerging paradigm to enhance network efficiency and perceptual quality, particularly demonstrating strong potential in image generation tasks. However, existing deep learning (DL)-based single-modal reconstruction approaches often suffer from semantic distortion and image blurring under bandwidth-limited and highly noisy channel conditions, limiting their suitability in task-oriented perception scenarios. Although generative AI-based semantic communication can significantly reduce data transmission volume, its high sensitivity to channel noise and lack of dynamic adaptation mechanisms limit the stability of reconstruction. To address these challenges, this paper proposes a multi-modal semantic communication framework named <em>SEER</em>, designed for resource-constrained intelligent sensing terminals. Built upon a pretrained language model, SEER incorporates a channel-aware prompt control strategy, a dual-modal integrative semantic restoration mechanism (DISR), and a single-pass sequential cross-modal reconstruction pathway to achieve collaborative semantic representation and robust structural recovery between images and text. Experimental results demonstrate that SEER achieves approximately <span><math><mrow><mn>2.08</mn><mspace></mspace><mo>%</mo></mrow></math></span> bandwidth compression, while outperforming existing methods under extreme channel conditions by <span><math><mrow><mn>33.92</mn><mspace></mspace><mo>%</mo></mrow></math></span> in structural fidelity and <span><math><mrow><mn>12.64</mn><mspace></mspace><mo>%</mo></mrow></math></span> in perceptual consistency, highlighting its strong engineering deployability.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"329 ","pages":"Article 114464"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SEER: Knowledge-driven semantic image restoration with vision-language diffusion alignment\",\"authors\":\"Shengliang Wu , Jun Jiang , Xin He , Yong Xu , Yujun Zhu , Weiwei Jiang , Heju Li\",\"doi\":\"10.1016/j.knosys.2025.114464\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Semantic communication is an emerging paradigm to enhance network efficiency and perceptual quality, particularly demonstrating strong potential in image generation tasks. However, existing deep learning (DL)-based single-modal reconstruction approaches often suffer from semantic distortion and image blurring under bandwidth-limited and highly noisy channel conditions, limiting their suitability in task-oriented perception scenarios. Although generative AI-based semantic communication can significantly reduce data transmission volume, its high sensitivity to channel noise and lack of dynamic adaptation mechanisms limit the stability of reconstruction. To address these challenges, this paper proposes a multi-modal semantic communication framework named <em>SEER</em>, designed for resource-constrained intelligent sensing terminals. Built upon a pretrained language model, SEER incorporates a channel-aware prompt control strategy, a dual-modal integrative semantic restoration mechanism (DISR), and a single-pass sequential cross-modal reconstruction pathway to achieve collaborative semantic representation and robust structural recovery between images and text. Experimental results demonstrate that SEER achieves approximately <span><math><mrow><mn>2.08</mn><mspace></mspace><mo>%</mo></mrow></math></span> bandwidth compression, while outperforming existing methods under extreme channel conditions by <span><math><mrow><mn>33.92</mn><mspace></mspace><mo>%</mo></mrow></math></span> in structural fidelity and <span><math><mrow><mn>12.64</mn><mspace></mspace><mo>%</mo></mrow></math></span> in perceptual consistency, highlighting its strong engineering deployability.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"329 \",\"pages\":\"Article 114464\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705125015035\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125015035","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

语义通信是提高网络效率和感知质量的一种新兴范式，特别是在图像生成任务中显示出强大的潜力。然而，现有的基于深度学习（DL）的单模态重建方法在带宽有限和高噪声信道条件下经常遭受语义失真和图像模糊，限制了它们在面向任务的感知场景中的适用性。尽管基于生成式人工智能的语义通信可以显著减少数据传输量，但其对信道噪声的高敏感性和缺乏动态适应机制限制了重构的稳定性。为了解决这些问题，本文提出了一种多模态语义通信框架，称为SEER，专为资源受限的智能传感终端设计。SEER以预训练语言模型为基础，采用通道感知提示控制策略、双模态整合语义恢复机制（DISR）和单通道顺序跨模态重建路径，实现图像和文本之间的协同语义表示和鲁棒结构恢复。实验结果表明，SEER算法的带宽压缩率约为2.08%，在极端信道条件下的结构保真度和感知一致性分别比现有算法高33.92%和12.64%，具有较强的工程可部署性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SEER: Knowledge-driven semantic image restoration with vision-language diffusion alignment

Semantic communication is an emerging paradigm to enhance network efficiency and perceptual quality, particularly demonstrating strong potential in image generation tasks. However, existing deep learning (DL)-based single-modal reconstruction approaches often suffer from semantic distortion and image blurring under bandwidth-limited and highly noisy channel conditions, limiting their suitability in task-oriented perception scenarios. Although generative AI-based semantic communication can significantly reduce data transmission volume, its high sensitivity to channel noise and lack of dynamic adaptation mechanisms limit the stability of reconstruction. To address these challenges, this paper proposes a multi-modal semantic communication framework named SEER, designed for resource-constrained intelligent sensing terminals. Built upon a pretrained language model, SEER incorporates a channel-aware prompt control strategy, a dual-modal integrative semantic restoration mechanism (DISR), and a single-pass sequential cross-modal reconstruction pathway to achieve collaborative semantic representation and robust structural recovery between images and text. Experimental results demonstrate that SEER achieves approximately

2.08 %

bandwidth compression, while outperforming existing methods under extreme channel conditions by

33.92 %

in structural fidelity and

12.64 %

in perceptual consistency, highlighting its strong engineering deployability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.