Synchronous Multi-Modal Semantic Communication System With Packet-Level Coding

IF 10.7 1区计算机科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Wireless Communications Pub Date : 2025-02-04 DOI:10.1109/TWC.2025.3534995

Yun Tian;Jingkai Ying;Zhijin Qin;Ye Jin;Xiaoming Tao

{"title":"Synchronous Multi-Modal Semantic Communication System With Packet-Level Coding","authors":"Yun Tian;Jingkai Ying;Zhijin Qin;Ye Jin;Xiaoming Tao","doi":"10.1109/TWC.2025.3534995","DOIUrl":null,"url":null,"abstract":"Although the semantic communication with joint semantic-channel coding design has shown promising performance in transmitting data of different modalities over physical layer channels, the synchronization and packet-level forward error correction (FEC) of multimodal semantics have not been well studied. Synchronizing multimodal features in both the semantic and time domains is challenging due to the independent design of semantic encoders. In this paper, we take the facial video and speech transmission as an example and propose a Synchronous Multi-modal Semantic Communication System with Packet-Level Coding (SyncSC). To achieve semantic and time synchronization, 3D Morphable Mode (3DMM) coefficients and text are transmitted as semantics. We propose a semantic codec that achieves similar reconstruction quality with lower bandwidth. The visual-guided speech synthesis is designed to synchronize video, text and speech. We propose a packet-Level FEC method for video semantics, called PacSC, that maintains visual quality even at high packet loss rates. For text packets, a text packet loss concealment module, called TextPC, based on Bidirectional Encoder Representations from Transformers (BERT) is proposed, which improves the performance of traditional FEC methods. Simulation results show that SyncSC reduces transmission overhead while ensuring high-quality synchronous transmission of video and speech over the packet loss network.","PeriodicalId":13431,"journal":{"name":"IEEE Transactions on Wireless Communications","volume":"24 5","pages":"3684-3697"},"PeriodicalIF":10.7000,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Wireless Communications","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10872781/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Although the semantic communication with joint semantic-channel coding design has shown promising performance in transmitting data of different modalities over physical layer channels, the synchronization and packet-level forward error correction (FEC) of multimodal semantics have not been well studied. Synchronizing multimodal features in both the semantic and time domains is challenging due to the independent design of semantic encoders. In this paper, we take the facial video and speech transmission as an example and propose a Synchronous Multi-modal Semantic Communication System with Packet-Level Coding (SyncSC). To achieve semantic and time synchronization, 3D Morphable Mode (3DMM) coefficients and text are transmitted as semantics. We propose a semantic codec that achieves similar reconstruction quality with lower bandwidth. The visual-guided speech synthesis is designed to synchronize video, text and speech. We propose a packet-Level FEC method for video semantics, called PacSC, that maintains visual quality even at high packet loss rates. For text packets, a text packet loss concealment module, called TextPC, based on Bidirectional Encoder Representations from Transformers (BERT) is proposed, which improves the performance of traditional FEC methods. Simulation results show that SyncSC reduces transmission overhead while ensuring high-quality synchronous transmission of video and speech over the packet loss network.

查看原文本刊更多论文

基于包级编码的同步多模态语义通信系统

尽管采用联合语义信道编码设计的语义通信在物理层信道上传输不同模态的数据方面表现出了良好的性能，但多模态语义的同步和包级前向纠错（FEC）还没有得到很好的研究。由于语义编码器的独立设计，在语义和时间域同步多模态特征具有挑战性。本文以人脸视频和语音传输为例，提出了一种基于分组级编码（SyncSC）的同步多模态语义通信系统。为了实现语义和时间同步，3DMM （3D Morphable Mode）系数和文本作为语义传输。我们提出了一种语义编解码器，以更低的带宽实现相似的重构质量。视觉引导语音合成旨在同步视频、文本和语音。我们提出了一种数据包级FEC视频语义方法，称为PacSC，即使在高丢包率下也能保持视觉质量。针对文本包，提出了一种基于BERT （Bidirectional Encoder Representations from Transformers）的文本包丢失隐藏模块TextPC，提高了传统FEC方法的性能。仿真结果表明，SyncSC在保证丢包网络视频和语音高质量同步传输的同时，降低了传输开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Wireless Communications 工程技术-电信学

CiteScore

18.60

自引率

10.60%

发文量

708

审稿时长

5.6 months

期刊介绍： The IEEE Transactions on Wireless Communications is a prestigious publication that showcases cutting-edge advancements in wireless communications. It welcomes both theoretical and practical contributions in various areas. The scope of the Transactions encompasses a wide range of topics, including modulation and coding, detection and estimation, propagation and channel characterization, and diversity techniques. The journal also emphasizes the physical and link layer communication aspects of network architectures and protocols. The journal is open to papers on specific topics or non-traditional topics related to specific application areas. This includes simulation tools and methodologies, orthogonal frequency division multiplexing, MIMO systems, and wireless over optical technologies. Overall, the IEEE Transactions on Wireless Communications serves as a platform for high-quality manuscripts that push the boundaries of wireless communications and contribute to advancements in the field.