SEER: Knowledge-driven semantic image restoration with vision-language diffusion alignment

IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Shengliang Wu , Jun Jiang , Xin He , Yong Xu , Yujun Zhu , Weiwei Jiang , Heju Li
{"title":"SEER: Knowledge-driven semantic image restoration with vision-language diffusion alignment","authors":"Shengliang Wu ,&nbsp;Jun Jiang ,&nbsp;Xin He ,&nbsp;Yong Xu ,&nbsp;Yujun Zhu ,&nbsp;Weiwei Jiang ,&nbsp;Heju Li","doi":"10.1016/j.knosys.2025.114464","DOIUrl":null,"url":null,"abstract":"<div><div>Semantic communication is an emerging paradigm to enhance network efficiency and perceptual quality, particularly demonstrating strong potential in image generation tasks. However, existing deep learning (DL)-based single-modal reconstruction approaches often suffer from semantic distortion and image blurring under bandwidth-limited and highly noisy channel conditions, limiting their suitability in task-oriented perception scenarios. Although generative AI-based semantic communication can significantly reduce data transmission volume, its high sensitivity to channel noise and lack of dynamic adaptation mechanisms limit the stability of reconstruction. To address these challenges, this paper proposes a multi-modal semantic communication framework named <em>SEER</em>, designed for resource-constrained intelligent sensing terminals. Built upon a pretrained language model, SEER incorporates a channel-aware prompt control strategy, a dual-modal integrative semantic restoration mechanism (DISR), and a single-pass sequential cross-modal reconstruction pathway to achieve collaborative semantic representation and robust structural recovery between images and text. Experimental results demonstrate that SEER achieves approximately <span><math><mrow><mn>2.08</mn><mspace></mspace><mo>%</mo></mrow></math></span> bandwidth compression, while outperforming existing methods under extreme channel conditions by <span><math><mrow><mn>33.92</mn><mspace></mspace><mo>%</mo></mrow></math></span> in structural fidelity and <span><math><mrow><mn>12.64</mn><mspace></mspace><mo>%</mo></mrow></math></span> in perceptual consistency, highlighting its strong engineering deployability.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"329 ","pages":"Article 114464"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125015035","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Semantic communication is an emerging paradigm to enhance network efficiency and perceptual quality, particularly demonstrating strong potential in image generation tasks. However, existing deep learning (DL)-based single-modal reconstruction approaches often suffer from semantic distortion and image blurring under bandwidth-limited and highly noisy channel conditions, limiting their suitability in task-oriented perception scenarios. Although generative AI-based semantic communication can significantly reduce data transmission volume, its high sensitivity to channel noise and lack of dynamic adaptation mechanisms limit the stability of reconstruction. To address these challenges, this paper proposes a multi-modal semantic communication framework named SEER, designed for resource-constrained intelligent sensing terminals. Built upon a pretrained language model, SEER incorporates a channel-aware prompt control strategy, a dual-modal integrative semantic restoration mechanism (DISR), and a single-pass sequential cross-modal reconstruction pathway to achieve collaborative semantic representation and robust structural recovery between images and text. Experimental results demonstrate that SEER achieves approximately 2.08% bandwidth compression, while outperforming existing methods under extreme channel conditions by 33.92% in structural fidelity and 12.64% in perceptual consistency, highlighting its strong engineering deployability.
知识驱动的语义图像恢复与视觉语言扩散对齐
语义通信是提高网络效率和感知质量的一种新兴范式,特别是在图像生成任务中显示出强大的潜力。然而,现有的基于深度学习(DL)的单模态重建方法在带宽有限和高噪声信道条件下经常遭受语义失真和图像模糊,限制了它们在面向任务的感知场景中的适用性。尽管基于生成式人工智能的语义通信可以显著减少数据传输量,但其对信道噪声的高敏感性和缺乏动态适应机制限制了重构的稳定性。为了解决这些问题,本文提出了一种多模态语义通信框架,称为SEER,专为资源受限的智能传感终端设计。SEER以预训练语言模型为基础,采用通道感知提示控制策略、双模态整合语义恢复机制(DISR)和单通道顺序跨模态重建路径,实现图像和文本之间的协同语义表示和鲁棒结构恢复。实验结果表明,SEER算法的带宽压缩率约为2.08%,在极端信道条件下的结构保真度和感知一致性分别比现有算法高33.92%和12.64%,具有较强的工程可部署性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Knowledge-Based Systems
Knowledge-Based Systems 工程技术-计算机:人工智能
CiteScore
14.80
自引率
12.50%
发文量
1245
审稿时长
7.8 months
期刊介绍: Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信