Maheshi Lokumarambage;Thushan Sivalingam;Feng Dong;Nandana Rajatheva;Anil Fernando
{"title":"Generative AI-Based Vector Quantized End-to-End Semantic Communication System for Wireless Image Transmission","authors":"Maheshi Lokumarambage;Thushan Sivalingam;Feng Dong;Nandana Rajatheva;Anil Fernando","doi":"10.1109/TMLCN.2025.3607891","DOIUrl":null,"url":null,"abstract":"Semantic communication (SemCom) systems enhance transmission efficiency by conveying semantic information in lieu of raw data. However, challenges arise when designing these systems due to the need for robust semantic source coding for information representation extending beyond the training dataset, maintaining channel-agnostic performance, and ensuring robustness to channel and semantic noise. We propose a novel generative artificial intelligence (AI) based SemCom architecture conditioned on quantized latent. The system reduces the communication overhead of the wireless channel by transmitting the index of the quantized latent over the communication channel by mapping the quantized vector to the learned codebook vectors. The learned codebook is the shared knowledge base. The encoder is designed with a novel spatial attention mechanism based on image energy, focusing on object edges. The critic assesses the realism of generated data relative to the original distribution, with the Wasserstein distance. The model introduces novel contrastive objectives at multiple levels, including pixel, latent, perceptual, and task output, tailored for noisy wireless semantic communication. We validated the proposed model for transmission quality and robustness with low-density parity-check (LDPC), which outperforms the baselines of better portable graphics (BPG), specifically at low signal-to-noise ratio (SNR) levels (<inline-formula> <tex-math>$ {\\lt }~ {5}$ </tex-math></inline-formula> dB). Additionally, it shows comparable results with joint source-channel coding (JSCC) with lower complexity and latency. The model is validated for human perception and machine perception-oriented task utility. The model effectively transmits high-resolution images without requiring additional error correction at the receiver. We propose a novel semantic-based matrix to evaluate the robustness to noise and task-specific semantic distortion.","PeriodicalId":100641,"journal":{"name":"IEEE Transactions on Machine Learning in Communications and Networking","volume":"3 ","pages":"1050-1074"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11154002","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Machine Learning in Communications and Networking","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11154002/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Semantic communication (SemCom) systems enhance transmission efficiency by conveying semantic information in lieu of raw data. However, challenges arise when designing these systems due to the need for robust semantic source coding for information representation extending beyond the training dataset, maintaining channel-agnostic performance, and ensuring robustness to channel and semantic noise. We propose a novel generative artificial intelligence (AI) based SemCom architecture conditioned on quantized latent. The system reduces the communication overhead of the wireless channel by transmitting the index of the quantized latent over the communication channel by mapping the quantized vector to the learned codebook vectors. The learned codebook is the shared knowledge base. The encoder is designed with a novel spatial attention mechanism based on image energy, focusing on object edges. The critic assesses the realism of generated data relative to the original distribution, with the Wasserstein distance. The model introduces novel contrastive objectives at multiple levels, including pixel, latent, perceptual, and task output, tailored for noisy wireless semantic communication. We validated the proposed model for transmission quality and robustness with low-density parity-check (LDPC), which outperforms the baselines of better portable graphics (BPG), specifically at low signal-to-noise ratio (SNR) levels ($ {\lt }~ {5}$ dB). Additionally, it shows comparable results with joint source-channel coding (JSCC) with lower complexity and latency. The model is validated for human perception and machine perception-oriented task utility. The model effectively transmits high-resolution images without requiring additional error correction at the receiver. We propose a novel semantic-based matrix to evaluate the robustness to noise and task-specific semantic distortion.