Generative AI-Based Vector Quantized End-to-End Semantic Communication System for Wireless Image Transmission

IEEE Transactions on Machine Learning in Communications and Networking Pub Date : 2025-09-09 DOI:10.1109/TMLCN.2025.3607891

Maheshi Lokumarambage;Thushan Sivalingam;Feng Dong;Nandana Rajatheva;Anil Fernando

{"title":"Generative AI-Based Vector Quantized End-to-End Semantic Communication System for Wireless Image Transmission","authors":"Maheshi Lokumarambage;Thushan Sivalingam;Feng Dong;Nandana Rajatheva;Anil Fernando","doi":"10.1109/TMLCN.2025.3607891","DOIUrl":null,"url":null,"abstract":"Semantic communication (SemCom) systems enhance transmission efficiency by conveying semantic information in lieu of raw data. However, challenges arise when designing these systems due to the need for robust semantic source coding for information representation extending beyond the training dataset, maintaining channel-agnostic performance, and ensuring robustness to channel and semantic noise. We propose a novel generative artificial intelligence (AI) based SemCom architecture conditioned on quantized latent. The system reduces the communication overhead of the wireless channel by transmitting the index of the quantized latent over the communication channel by mapping the quantized vector to the learned codebook vectors. The learned codebook is the shared knowledge base. The encoder is designed with a novel spatial attention mechanism based on image energy, focusing on object edges. The critic assesses the realism of generated data relative to the original distribution, with the Wasserstein distance. The model introduces novel contrastive objectives at multiple levels, including pixel, latent, perceptual, and task output, tailored for noisy wireless semantic communication. We validated the proposed model for transmission quality and robustness with low-density parity-check (LDPC), which outperforms the baselines of better portable graphics (BPG), specifically at low signal-to-noise ratio (SNR) levels (<inline-formula> <tex-math>$ {\\lt }~ {5}$ </tex-math></inline-formula> dB). Additionally, it shows comparable results with joint source-channel coding (JSCC) with lower complexity and latency. The model is validated for human perception and machine perception-oriented task utility. The model effectively transmits high-resolution images without requiring additional error correction at the receiver. We propose a novel semantic-based matrix to evaluate the robustness to noise and task-specific semantic distortion.","PeriodicalId":100641,"journal":{"name":"IEEE Transactions on Machine Learning in Communications and Networking","volume":"3 ","pages":"1050-1074"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11154002","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Machine Learning in Communications and Networking","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11154002/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Semantic communication (SemCom) systems enhance transmission efficiency by conveying semantic information in lieu of raw data. However, challenges arise when designing these systems due to the need for robust semantic source coding for information representation extending beyond the training dataset, maintaining channel-agnostic performance, and ensuring robustness to channel and semantic noise. We propose a novel generative artificial intelligence (AI) based SemCom architecture conditioned on quantized latent. The system reduces the communication overhead of the wireless channel by transmitting the index of the quantized latent over the communication channel by mapping the quantized vector to the learned codebook vectors. The learned codebook is the shared knowledge base. The encoder is designed with a novel spatial attention mechanism based on image energy, focusing on object edges. The critic assesses the realism of generated data relative to the original distribution, with the Wasserstein distance. The model introduces novel contrastive objectives at multiple levels, including pixel, latent, perceptual, and task output, tailored for noisy wireless semantic communication. We validated the proposed model for transmission quality and robustness with low-density parity-check (LDPC), which outperforms the baselines of better portable graphics (BPG), specifically at low signal-to-noise ratio (SNR) levels (

$ {\lt }~ {5}$

dB). Additionally, it shows comparable results with joint source-channel coding (JSCC) with lower complexity and latency. The model is validated for human perception and machine perception-oriented task utility. The model effectively transmits high-resolution images without requiring additional error correction at the receiver. We propose a novel semantic-based matrix to evaluate the robustness to noise and task-specific semantic distortion.

查看原文本刊更多论文

基于生成人工智能的矢量量化端到端无线图像传输语义通信系统

语义通信（SemCom）系统通过传递语义信息代替原始数据来提高传输效率。然而，在设计这些系统时，由于需要对超出训练数据集的信息表示进行鲁棒的语义源编码，保持信道不可知的性能，并确保对信道和语义噪声的鲁棒性，因此会出现挑战。我们提出了一种基于量化潜的SemCom架构。该系统通过将量化的隐波向量映射到学习到的码本向量，在通信信道上传输量化的隐波索引，从而降低了无线信道的通信开销。学习的代码本是共享的知识库。该编码器设计了一种基于图像能量的空间注意机制，聚焦于物体边缘。评论家评估生成的数据相对于原始分布的真实性，使用Wasserstein距离。该模型在多个层面引入了新的对比目标，包括像素、潜在、感知和任务输出，为有噪声的无线语义通信量身定制。我们通过低密度奇偶校验（LDPC）验证了所提出的模型的传输质量和鲁棒性，其性能优于更好的便携式图形（BPG）的基线，特别是在低信噪比（SNR）水平（$ {\lt}~ {5}$ dB）下。此外，它还显示了具有较低复杂性和延迟的联合源信道编码（JSCC）的类似结果。该模型在面向人类感知和面向机器感知的任务应用中得到了验证。该模型可以有效地传输高分辨率图像，而无需在接收器处进行额外的误差校正。我们提出了一种新的基于语义的矩阵来评估噪声和任务特定语义失真的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Machine Learning in Communications and Networking

自引率

0.00%

发文量