Shengliang Wu , Jun Jiang , Xin He , Yong Xu , Yujun Zhu , Weiwei Jiang , Heju Li
{"title":"知识驱动的语义图像恢复与视觉语言扩散对齐","authors":"Shengliang Wu , Jun Jiang , Xin He , Yong Xu , Yujun Zhu , Weiwei Jiang , Heju Li","doi":"10.1016/j.knosys.2025.114464","DOIUrl":null,"url":null,"abstract":"<div><div>Semantic communication is an emerging paradigm to enhance network efficiency and perceptual quality, particularly demonstrating strong potential in image generation tasks. However, existing deep learning (DL)-based single-modal reconstruction approaches often suffer from semantic distortion and image blurring under bandwidth-limited and highly noisy channel conditions, limiting their suitability in task-oriented perception scenarios. Although generative AI-based semantic communication can significantly reduce data transmission volume, its high sensitivity to channel noise and lack of dynamic adaptation mechanisms limit the stability of reconstruction. To address these challenges, this paper proposes a multi-modal semantic communication framework named <em>SEER</em>, designed for resource-constrained intelligent sensing terminals. Built upon a pretrained language model, SEER incorporates a channel-aware prompt control strategy, a dual-modal integrative semantic restoration mechanism (DISR), and a single-pass sequential cross-modal reconstruction pathway to achieve collaborative semantic representation and robust structural recovery between images and text. Experimental results demonstrate that SEER achieves approximately <span><math><mrow><mn>2.08</mn><mspace></mspace><mo>%</mo></mrow></math></span> bandwidth compression, while outperforming existing methods under extreme channel conditions by <span><math><mrow><mn>33.92</mn><mspace></mspace><mo>%</mo></mrow></math></span> in structural fidelity and <span><math><mrow><mn>12.64</mn><mspace></mspace><mo>%</mo></mrow></math></span> in perceptual consistency, highlighting its strong engineering deployability.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"329 ","pages":"Article 114464"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SEER: Knowledge-driven semantic image restoration with vision-language diffusion alignment\",\"authors\":\"Shengliang Wu , Jun Jiang , Xin He , Yong Xu , Yujun Zhu , Weiwei Jiang , Heju Li\",\"doi\":\"10.1016/j.knosys.2025.114464\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Semantic communication is an emerging paradigm to enhance network efficiency and perceptual quality, particularly demonstrating strong potential in image generation tasks. However, existing deep learning (DL)-based single-modal reconstruction approaches often suffer from semantic distortion and image blurring under bandwidth-limited and highly noisy channel conditions, limiting their suitability in task-oriented perception scenarios. Although generative AI-based semantic communication can significantly reduce data transmission volume, its high sensitivity to channel noise and lack of dynamic adaptation mechanisms limit the stability of reconstruction. To address these challenges, this paper proposes a multi-modal semantic communication framework named <em>SEER</em>, designed for resource-constrained intelligent sensing terminals. Built upon a pretrained language model, SEER incorporates a channel-aware prompt control strategy, a dual-modal integrative semantic restoration mechanism (DISR), and a single-pass sequential cross-modal reconstruction pathway to achieve collaborative semantic representation and robust structural recovery between images and text. Experimental results demonstrate that SEER achieves approximately <span><math><mrow><mn>2.08</mn><mspace></mspace><mo>%</mo></mrow></math></span> bandwidth compression, while outperforming existing methods under extreme channel conditions by <span><math><mrow><mn>33.92</mn><mspace></mspace><mo>%</mo></mrow></math></span> in structural fidelity and <span><math><mrow><mn>12.64</mn><mspace></mspace><mo>%</mo></mrow></math></span> in perceptual consistency, highlighting its strong engineering deployability.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"329 \",\"pages\":\"Article 114464\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705125015035\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125015035","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
SEER: Knowledge-driven semantic image restoration with vision-language diffusion alignment
Semantic communication is an emerging paradigm to enhance network efficiency and perceptual quality, particularly demonstrating strong potential in image generation tasks. However, existing deep learning (DL)-based single-modal reconstruction approaches often suffer from semantic distortion and image blurring under bandwidth-limited and highly noisy channel conditions, limiting their suitability in task-oriented perception scenarios. Although generative AI-based semantic communication can significantly reduce data transmission volume, its high sensitivity to channel noise and lack of dynamic adaptation mechanisms limit the stability of reconstruction. To address these challenges, this paper proposes a multi-modal semantic communication framework named SEER, designed for resource-constrained intelligent sensing terminals. Built upon a pretrained language model, SEER incorporates a channel-aware prompt control strategy, a dual-modal integrative semantic restoration mechanism (DISR), and a single-pass sequential cross-modal reconstruction pathway to achieve collaborative semantic representation and robust structural recovery between images and text. Experimental results demonstrate that SEER achieves approximately bandwidth compression, while outperforming existing methods under extreme channel conditions by in structural fidelity and in perceptual consistency, highlighting its strong engineering deployability.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.