Improved Nonlinear Transform Source-Channel Coding to Catalyze Semantic Communications

IF 8.7 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Signal Processing Pub Date : 2023-08-10 DOI:10.1109/JSTSP.2023.3304140

Sixian Wang;Jincheng Dai;Xiaoqi Qin;Zhongwei Si;Kai Niu;Ping Zhang

{"title":"Improved Nonlinear Transform Source-Channel Coding to Catalyze Semantic Communications","authors":"Sixian Wang;Jincheng Dai;Xiaoqi Qin;Zhongwei Si;Kai Niu;Ping Zhang","doi":"10.1109/JSTSP.2023.3304140","DOIUrl":null,"url":null,"abstract":"Recent deep learning methods have led to increased interest in solving high-efficiency end-to-end transmission problems. These methods, we call \n<italic>nonlinear transform source-channel coding (NTSCC)</i>\n, extract the semantic latent features of source signal, and learn entropy model to guide the joint source-channel coding with variable rate to transmit latent features over wireless channels. In this article, we propose a comprehensive framework for improving NTSCC, thereby higher system coding gain, better model compatibility, more flexible adaptation strategy aligned with semantic guidance are all achieved. This new sophisticated NTSCC model is now ready to support large-size data interaction in emerging XR, which catalyzes the application of semantic communications. Specifically, we propose three useful improvement approaches. First, we introduce a contextual entropy model to better capture the spatial correlations among the semantic latent features, thereby more accurate rate allocation and contextual joint source-channel coding method are developed accordingly to enable higher coding gain. On that basis, we further propose a response network architecture to formulate \n<italic>compatible</i>\n NTSCC, i.e., once-learned model supports various bandwidth ratios and channel states that benefits practical deployment greatly. Following this, we propose an online latent feature editing mechanism to enable more flexible coding rate allocation aligned with some specific semantic guidance. By comprehensively applying the above three improvement methods for NTSCC, a deployment-friendly semantic coded transmission system stands out finally. Our improved NTSCC system has been experimentally verified to achieve a better rate-distortion efficiency versus the state-of-the-art engineered VTM + 5G LDPC coded transmission system with lower processing latency.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"17 5","pages":"1022-1037"},"PeriodicalIF":8.7000,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Selected Topics in Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10214392/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 1

Abstract

Recent deep learning methods have led to increased interest in solving high-efficiency end-to-end transmission problems. These methods, we call nonlinear transform source-channel coding (NTSCC) , extract the semantic latent features of source signal, and learn entropy model to guide the joint source-channel coding with variable rate to transmit latent features over wireless channels. In this article, we propose a comprehensive framework for improving NTSCC, thereby higher system coding gain, better model compatibility, more flexible adaptation strategy aligned with semantic guidance are all achieved. This new sophisticated NTSCC model is now ready to support large-size data interaction in emerging XR, which catalyzes the application of semantic communications. Specifically, we propose three useful improvement approaches. First, we introduce a contextual entropy model to better capture the spatial correlations among the semantic latent features, thereby more accurate rate allocation and contextual joint source-channel coding method are developed accordingly to enable higher coding gain. On that basis, we further propose a response network architecture to formulate compatible NTSCC, i.e., once-learned model supports various bandwidth ratios and channel states that benefits practical deployment greatly. Following this, we propose an online latent feature editing mechanism to enable more flexible coding rate allocation aligned with some specific semantic guidance. By comprehensively applying the above three improvement methods for NTSCC, a deployment-friendly semantic coded transmission system stands out finally. Our improved NTSCC system has been experimentally verified to achieve a better rate-distortion efficiency versus the state-of-the-art engineered VTM + 5G LDPC coded transmission system with lower processing latency.

查看原文本刊更多论文

改进的非线性变换源信道编码促进语义通信

最近的深度学习方法引起了人们对解决高效端到端传输问题的兴趣。这些方法被称为非线性变换源信道编码(NTSCC)，提取源信号的语义潜在特征，并学习熵模型来指导可变速率的联合源信道编码在无线信道上传输潜在特征。在本文中，我们提出了一个改进NTSCC的综合框架，从而实现更高的系统编码增益，更好的模型兼容性，更灵活的适应策略与语义引导一致。这种新的复杂的NTSCC模型现在已经准备好支持新兴XR中的大规模数据交互，这促进了语义通信的应用。具体来说，我们提出了三种有用的改进方法。首先，我们引入上下文熵模型来更好地捕捉语义潜在特征之间的空间相关性，从而开发更准确的速率分配和上下文联合源信道编码方法，以实现更高的编码增益。在此基础上，我们进一步提出了一种响应网络架构来制定兼容的NTSCC，即一次学习模型支持各种带宽比和信道状态，极大地有利于实际部署。在此基础上，我们提出了一种在线潜特征编辑机制，使编码率分配更加灵活，并与特定的语义指导相一致。综合运用以上三种改进方法，最终形成一种便于部署的语义编码传输系统。我们改进的NTSCC系统已经过实验验证，与最先进的工程VTM + 5G LDPC编码传输系统相比，具有更好的率失真效率和更低的处理延迟。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Journal of Selected Topics in Signal Processing 工程技术-工程：电子与电气

CiteScore

19.00

自引率

1.30%

发文量

135

审稿时长

3 months

期刊介绍： The IEEE Journal of Selected Topics in Signal Processing (JSTSP) focuses on the Field of Interest of the IEEE Signal Processing Society, which encompasses the theory and application of various signal processing techniques. These techniques include filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals using digital or analog devices. The term "signal" covers a wide range of data types, including audio, video, speech, image, communication, geophysical, sonar, radar, medical, musical, and others. The journal format allows for in-depth exploration of signal processing topics, enabling the Society to cover both established and emerging areas. This includes interdisciplinary fields such as biomedical engineering and language processing, as well as areas not traditionally associated with engineering.