Synchronous Multi-Modal Semantic Communication System With Packet-Level Coding

IF 10.7 1区 计算机科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Yun Tian;Jingkai Ying;Zhijin Qin;Ye Jin;Xiaoming Tao
{"title":"Synchronous Multi-Modal Semantic Communication System With Packet-Level Coding","authors":"Yun Tian;Jingkai Ying;Zhijin Qin;Ye Jin;Xiaoming Tao","doi":"10.1109/TWC.2025.3534995","DOIUrl":null,"url":null,"abstract":"Although the semantic communication with joint semantic-channel coding design has shown promising performance in transmitting data of different modalities over physical layer channels, the synchronization and packet-level forward error correction (FEC) of multimodal semantics have not been well studied. Synchronizing multimodal features in both the semantic and time domains is challenging due to the independent design of semantic encoders. In this paper, we take the facial video and speech transmission as an example and propose a Synchronous Multi-modal Semantic Communication System with Packet-Level Coding (SyncSC). To achieve semantic and time synchronization, 3D Morphable Mode (3DMM) coefficients and text are transmitted as semantics. We propose a semantic codec that achieves similar reconstruction quality with lower bandwidth. The visual-guided speech synthesis is designed to synchronize video, text and speech. We propose a packet-Level FEC method for video semantics, called PacSC, that maintains visual quality even at high packet loss rates. For text packets, a text packet loss concealment module, called TextPC, based on Bidirectional Encoder Representations from Transformers (BERT) is proposed, which improves the performance of traditional FEC methods. Simulation results show that SyncSC reduces transmission overhead while ensuring high-quality synchronous transmission of video and speech over the packet loss network.","PeriodicalId":13431,"journal":{"name":"IEEE Transactions on Wireless Communications","volume":"24 5","pages":"3684-3697"},"PeriodicalIF":10.7000,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Wireless Communications","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10872781/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Although the semantic communication with joint semantic-channel coding design has shown promising performance in transmitting data of different modalities over physical layer channels, the synchronization and packet-level forward error correction (FEC) of multimodal semantics have not been well studied. Synchronizing multimodal features in both the semantic and time domains is challenging due to the independent design of semantic encoders. In this paper, we take the facial video and speech transmission as an example and propose a Synchronous Multi-modal Semantic Communication System with Packet-Level Coding (SyncSC). To achieve semantic and time synchronization, 3D Morphable Mode (3DMM) coefficients and text are transmitted as semantics. We propose a semantic codec that achieves similar reconstruction quality with lower bandwidth. The visual-guided speech synthesis is designed to synchronize video, text and speech. We propose a packet-Level FEC method for video semantics, called PacSC, that maintains visual quality even at high packet loss rates. For text packets, a text packet loss concealment module, called TextPC, based on Bidirectional Encoder Representations from Transformers (BERT) is proposed, which improves the performance of traditional FEC methods. Simulation results show that SyncSC reduces transmission overhead while ensuring high-quality synchronous transmission of video and speech over the packet loss network.
基于包级编码的同步多模态语义通信系统
尽管采用联合语义信道编码设计的语义通信在物理层信道上传输不同模态的数据方面表现出了良好的性能,但多模态语义的同步和包级前向纠错(FEC)还没有得到很好的研究。由于语义编码器的独立设计,在语义和时间域同步多模态特征具有挑战性。本文以人脸视频和语音传输为例,提出了一种基于分组级编码(SyncSC)的同步多模态语义通信系统。为了实现语义和时间同步,3DMM (3D Morphable Mode)系数和文本作为语义传输。我们提出了一种语义编解码器,以更低的带宽实现相似的重构质量。视觉引导语音合成旨在同步视频、文本和语音。我们提出了一种数据包级FEC视频语义方法,称为PacSC,即使在高丢包率下也能保持视觉质量。针对文本包,提出了一种基于BERT (Bidirectional Encoder Representations from Transformers)的文本包丢失隐藏模块TextPC,提高了传统FEC方法的性能。仿真结果表明,SyncSC在保证丢包网络视频和语音高质量同步传输的同时,降低了传输开销。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
18.60
自引率
10.60%
发文量
708
审稿时长
5.6 months
期刊介绍: The IEEE Transactions on Wireless Communications is a prestigious publication that showcases cutting-edge advancements in wireless communications. It welcomes both theoretical and practical contributions in various areas. The scope of the Transactions encompasses a wide range of topics, including modulation and coding, detection and estimation, propagation and channel characterization, and diversity techniques. The journal also emphasizes the physical and link layer communication aspects of network architectures and protocols. The journal is open to papers on specific topics or non-traditional topics related to specific application areas. This includes simulation tools and methodologies, orthogonal frequency division multiplexing, MIMO systems, and wireless over optical technologies. Overall, the IEEE Transactions on Wireless Communications serves as a platform for high-quality manuscripts that push the boundaries of wireless communications and contribute to advancements in the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信