Stacked Intelligent Metasurfaces for Multi-Modal Semantic Communications

IF 5.5 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Wireless Communications Letters Pub Date : 2025-06-17 DOI:10.1109/LWC.2025.3580441

Guojun Huang;Jiancheng An;Lu Gan;Dusit Niyato;Mérouane Debbah;Tie Jun Cui

{"title":"Stacked Intelligent Metasurfaces for Multi-Modal Semantic Communications","authors":"Guojun Huang;Jiancheng An;Lu Gan;Dusit Niyato;Mérouane Debbah;Tie Jun Cui","doi":"10.1109/LWC.2025.3580441","DOIUrl":null,"url":null,"abstract":"Semantic communication (SemCom) powered by generative artificial intelligence enables highly efficient and reliable information transmission. However, it still necessitates the transmission of substantial amounts of data when dealing with complex scene information. In contrast, the stacked intelligent metasurface (SIM), leveraging wave-domain computing, provides a cost-effective solution for directly imaging complex scenes. Building on this concept, we propose an innovative SIM-aided multi-modal SemCom system. Specifically, an SIM is positioned in front of the transmit antenna for transmitting visual semantic information of complex scenes via imaging on the uniform planar array at the receiver. Furthermore, the simple scene description that contains textual semantic information is transmitted via amplitude-phase modulation over electromagnetic waves. To simultaneously transmit multi-modal information, we optimize the amplitude and phase of meta-atoms in the SIM using a customized gradient descent algorithm. The optimization aims to gradually minimize the mean squared error between the normalized energy distribution on the receiver array and the desired pattern corresponding to the visual semantic information. By combining the textual and visual semantic information, a conditional generative adversarial network is used to recover the complex scene accurately. Extensive numerical results verify the effectiveness of the proposed multi-modal SemCom system in reducing bandwidth overhead as well as the capability of the SIM for imaging the complex scene.","PeriodicalId":13343,"journal":{"name":"IEEE Wireless Communications Letters","volume":"14 9","pages":"2828-2832"},"PeriodicalIF":5.5000,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Wireless Communications Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11038827/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Semantic communication (SemCom) powered by generative artificial intelligence enables highly efficient and reliable information transmission. However, it still necessitates the transmission of substantial amounts of data when dealing with complex scene information. In contrast, the stacked intelligent metasurface (SIM), leveraging wave-domain computing, provides a cost-effective solution for directly imaging complex scenes. Building on this concept, we propose an innovative SIM-aided multi-modal SemCom system. Specifically, an SIM is positioned in front of the transmit antenna for transmitting visual semantic information of complex scenes via imaging on the uniform planar array at the receiver. Furthermore, the simple scene description that contains textual semantic information is transmitted via amplitude-phase modulation over electromagnetic waves. To simultaneously transmit multi-modal information, we optimize the amplitude and phase of meta-atoms in the SIM using a customized gradient descent algorithm. The optimization aims to gradually minimize the mean squared error between the normalized energy distribution on the receiver array and the desired pattern corresponding to the visual semantic information. By combining the textual and visual semantic information, a conditional generative adversarial network is used to recover the complex scene accurately. Extensive numerical results verify the effectiveness of the proposed multi-modal SemCom system in reducing bandwidth overhead as well as the capability of the SIM for imaging the complex scene.

查看原文本刊更多论文

面向多模态语义通信的堆叠智能元表面

由生成式人工智能驱动的语义通信（SemCom）实现了高效可靠的信息传输。但是，在处理复杂的场景信息时，仍然需要传输大量的数据。相比之下，利用波域计算的堆叠智能超表面（SIM）为直接成像复杂场景提供了一种经济有效的解决方案。基于这一概念，我们提出了一种创新的sim辅助多模态SemCom系统。具体来说，在发射天线前放置一个SIM卡，通过在接收机的均匀平面阵列上成像来传输复杂场景的视觉语义信息。此外，包含文本语义信息的简单场景描述通过电磁波的幅相位调制传输。为了同时传输多模态信息，我们使用定制的梯度下降算法优化了SIM中元原子的振幅和相位。优化的目的是逐渐最小化接收器阵列上归一化能量分布与视觉语义信息对应的期望模式之间的均方误差。结合文本信息和视觉语义信息，采用条件生成对抗网络对复杂场景进行精确恢复。大量的数值结果验证了所提出的多模态SemCom系统在减少带宽开销方面的有效性以及SIM对复杂场景成像的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Wireless Communications Letters Engineering-Electrical and Electronic Engineering

CiteScore

12.30

自引率

6.30%

发文量

481

期刊介绍： IEEE Wireless Communications Letters publishes short papers in a rapid publication cycle on advances in the state-of-the-art of wireless communications. Both theoretical contributions (including new techniques, concepts, and analyses) and practical contributions (including system experiments and prototypes, and new applications) are encouraged. This journal focuses on the physical layer and the link layer of wireless communication systems.