Stacked Intelligent Metasurfaces for Multi-Modal Semantic Communications

IF 5.5 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Guojun Huang;Jiancheng An;Lu Gan;Dusit Niyato;Mérouane Debbah;Tie Jun Cui
{"title":"Stacked Intelligent Metasurfaces for Multi-Modal Semantic Communications","authors":"Guojun Huang;Jiancheng An;Lu Gan;Dusit Niyato;Mérouane Debbah;Tie Jun Cui","doi":"10.1109/LWC.2025.3580441","DOIUrl":null,"url":null,"abstract":"Semantic communication (SemCom) powered by generative artificial intelligence enables highly efficient and reliable information transmission. However, it still necessitates the transmission of substantial amounts of data when dealing with complex scene information. In contrast, the stacked intelligent metasurface (SIM), leveraging wave-domain computing, provides a cost-effective solution for directly imaging complex scenes. Building on this concept, we propose an innovative SIM-aided multi-modal SemCom system. Specifically, an SIM is positioned in front of the transmit antenna for transmitting visual semantic information of complex scenes via imaging on the uniform planar array at the receiver. Furthermore, the simple scene description that contains textual semantic information is transmitted via amplitude-phase modulation over electromagnetic waves. To simultaneously transmit multi-modal information, we optimize the amplitude and phase of meta-atoms in the SIM using a customized gradient descent algorithm. The optimization aims to gradually minimize the mean squared error between the normalized energy distribution on the receiver array and the desired pattern corresponding to the visual semantic information. By combining the textual and visual semantic information, a conditional generative adversarial network is used to recover the complex scene accurately. Extensive numerical results verify the effectiveness of the proposed multi-modal SemCom system in reducing bandwidth overhead as well as the capability of the SIM for imaging the complex scene.","PeriodicalId":13343,"journal":{"name":"IEEE Wireless Communications Letters","volume":"14 9","pages":"2828-2832"},"PeriodicalIF":5.5000,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Wireless Communications Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11038827/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Semantic communication (SemCom) powered by generative artificial intelligence enables highly efficient and reliable information transmission. However, it still necessitates the transmission of substantial amounts of data when dealing with complex scene information. In contrast, the stacked intelligent metasurface (SIM), leveraging wave-domain computing, provides a cost-effective solution for directly imaging complex scenes. Building on this concept, we propose an innovative SIM-aided multi-modal SemCom system. Specifically, an SIM is positioned in front of the transmit antenna for transmitting visual semantic information of complex scenes via imaging on the uniform planar array at the receiver. Furthermore, the simple scene description that contains textual semantic information is transmitted via amplitude-phase modulation over electromagnetic waves. To simultaneously transmit multi-modal information, we optimize the amplitude and phase of meta-atoms in the SIM using a customized gradient descent algorithm. The optimization aims to gradually minimize the mean squared error between the normalized energy distribution on the receiver array and the desired pattern corresponding to the visual semantic information. By combining the textual and visual semantic information, a conditional generative adversarial network is used to recover the complex scene accurately. Extensive numerical results verify the effectiveness of the proposed multi-modal SemCom system in reducing bandwidth overhead as well as the capability of the SIM for imaging the complex scene.
面向多模态语义通信的堆叠智能元表面
由生成式人工智能驱动的语义通信(SemCom)实现了高效可靠的信息传输。但是,在处理复杂的场景信息时,仍然需要传输大量的数据。相比之下,利用波域计算的堆叠智能超表面(SIM)为直接成像复杂场景提供了一种经济有效的解决方案。基于这一概念,我们提出了一种创新的sim辅助多模态SemCom系统。具体来说,在发射天线前放置一个SIM卡,通过在接收机的均匀平面阵列上成像来传输复杂场景的视觉语义信息。此外,包含文本语义信息的简单场景描述通过电磁波的幅相位调制传输。为了同时传输多模态信息,我们使用定制的梯度下降算法优化了SIM中元原子的振幅和相位。优化的目的是逐渐最小化接收器阵列上归一化能量分布与视觉语义信息对应的期望模式之间的均方误差。结合文本信息和视觉语义信息,采用条件生成对抗网络对复杂场景进行精确恢复。大量的数值结果验证了所提出的多模态SemCom系统在减少带宽开销方面的有效性以及SIM对复杂场景成像的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Wireless Communications Letters
IEEE Wireless Communications Letters Engineering-Electrical and Electronic Engineering
CiteScore
12.30
自引率
6.30%
发文量
481
期刊介绍: IEEE Wireless Communications Letters publishes short papers in a rapid publication cycle on advances in the state-of-the-art of wireless communications. Both theoretical contributions (including new techniques, concepts, and analyses) and practical contributions (including system experiments and prototypes, and new applications) are encouraged. This journal focuses on the physical layer and the link layer of wireless communication systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信