协作通信中的语义时代：通过离线强化学习加速模拟走向现实

IF 7.9 2区计算机科学 Q1 ENGINEERING, MULTIDISCIPLINARY

IEEE Transactions on Network Science and Engineering Pub Date : 2025-04-08 DOI:10.1109/TNSE.2025.3558747

Xianfu Chen;Zhifeng Zhao;Shiwen Mao;Celimuge Wu;Honggang Zhang;Mehdi Bennis

{"title":"协作通信中的语义时代：通过离线强化学习加速模拟走向现实","authors":"Xianfu Chen;Zhifeng Zhao;Shiwen Mao;Celimuge Wu;Honggang Zhang;Mehdi Bennis","doi":"10.1109/TNSE.2025.3558747","DOIUrl":null,"url":null,"abstract":"The age of information metric fails to correctly describe the intrinsic semantics of a status update. In an intelligent reflecting surface-aided cooperative relay communication system, we propose the age of semantics (AoS) for measuring the semantics freshness of status updates. Specifically, we focus on status updates from a source node (SN) to the destination, which is formulated as a Markov decision process. The objective of the SN is to maximize the expected satisfaction of AoS and energy consumption under the maximum transmit power constraint. To seek the optimal control policy, we first derive an online deep actor-critic (DAC) learning scheme under the on-policy temporal difference learning framework. However, implementing the online DAC in practice poses key challenge in infinitely repeated interactions between the SN and the system, which can be dangerous particularly during exploration. We then put forward a novel offline DAC scheme, which estimates the optimal control policy from a previously collected dataset without any further interactions with the system. Numerical experiments verify the theoretical results and show that our offline DAC scheme significantly outperforms the online DAC scheme and the most representative baselines in terms of mean utility, demonstrating strong robustness to dataset quality.","PeriodicalId":54229,"journal":{"name":"IEEE Transactions on Network Science and Engineering","volume":"12 4","pages":"3244-3258"},"PeriodicalIF":7.9000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Age of Semantics in Cooperative Communications: To Expedite Simulation Towards Real via Offline Reinforcement Learning\",\"authors\":\"Xianfu Chen;Zhifeng Zhao;Shiwen Mao;Celimuge Wu;Honggang Zhang;Mehdi Bennis\",\"doi\":\"10.1109/TNSE.2025.3558747\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The age of information metric fails to correctly describe the intrinsic semantics of a status update. In an intelligent reflecting surface-aided cooperative relay communication system, we propose the age of semantics (AoS) for measuring the semantics freshness of status updates. Specifically, we focus on status updates from a source node (SN) to the destination, which is formulated as a Markov decision process. The objective of the SN is to maximize the expected satisfaction of AoS and energy consumption under the maximum transmit power constraint. To seek the optimal control policy, we first derive an online deep actor-critic (DAC) learning scheme under the on-policy temporal difference learning framework. However, implementing the online DAC in practice poses key challenge in infinitely repeated interactions between the SN and the system, which can be dangerous particularly during exploration. We then put forward a novel offline DAC scheme, which estimates the optimal control policy from a previously collected dataset without any further interactions with the system. Numerical experiments verify the theoretical results and show that our offline DAC scheme significantly outperforms the online DAC scheme and the most representative baselines in terms of mean utility, demonstrating strong robustness to dataset quality.\",\"PeriodicalId\":54229,\"journal\":{\"name\":\"IEEE Transactions on Network Science and Engineering\",\"volume\":\"12 4\",\"pages\":\"3244-3258\"},\"PeriodicalIF\":7.9000,\"publicationDate\":\"2025-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Network Science and Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10955408/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Network Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10955408/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

信息度量时代无法正确描述状态更新的内在语义。在智能反射表面辅助协同中继通信系统中，我们提出了语义年龄（age of semantics, AoS）来度量状态更新的语义新鲜度。具体来说，我们关注从源节点到目标节点的状态更新，将其表述为马尔可夫决策过程。SN的目标是在最大发射功率约束下使AoS的预期满意度和能量消耗最大化。为了寻求最优控制策略，我们首先在非策略时间差异学习框架下推导了一种在线深度行为者-评论家（DAC）学习方案。然而，在实践中实现在线DAC对SN和系统之间的无限重复交互提出了关键挑战，特别是在探索过程中，这可能是危险的。然后，我们提出了一种新的离线DAC方案，该方案从先前收集的数据集估计最优控制策略，而无需与系统进一步交互。数值实验验证了理论结果，表明我们的离线DAC方案在平均效用方面显著优于在线DAC方案和最具代表性的基线，对数据集质量表现出较强的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Age of Semantics in Cooperative Communications: To Expedite Simulation Towards Real via Offline Reinforcement Learning

The age of information metric fails to correctly describe the intrinsic semantics of a status update. In an intelligent reflecting surface-aided cooperative relay communication system, we propose the age of semantics (AoS) for measuring the semantics freshness of status updates. Specifically, we focus on status updates from a source node (SN) to the destination, which is formulated as a Markov decision process. The objective of the SN is to maximize the expected satisfaction of AoS and energy consumption under the maximum transmit power constraint. To seek the optimal control policy, we first derive an online deep actor-critic (DAC) learning scheme under the on-policy temporal difference learning framework. However, implementing the online DAC in practice poses key challenge in infinitely repeated interactions between the SN and the system, which can be dangerous particularly during exploration. We then put forward a novel offline DAC scheme, which estimates the optimal control policy from a previously collected dataset without any further interactions with the system. Numerical experiments verify the theoretical results and show that our offline DAC scheme significantly outperforms the online DAC scheme and the most representative baselines in terms of mean utility, demonstrating strong robustness to dataset quality.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Network Science and Engineering Engineering-Control and Systems Engineering

CiteScore

12.60

自引率

9.10%

发文量

393

期刊介绍： The proposed journal, called the IEEE Transactions on Network Science and Engineering (TNSE), is committed to timely publishing of peer-reviewed technical articles that deal with the theory and applications of network science and the interconnections among the elements in a system that form a network. In particular, the IEEE Transactions on Network Science and Engineering publishes articles on understanding, prediction, and control of structures and behaviors of networks at the fundamental level. The types of networks covered include physical or engineered networks, information networks, biological networks, semantic networks, economic networks, social networks, and ecological networks. Aimed at discovering common principles that govern network structures, network functionalities and behaviors of networks, the journal seeks articles on understanding, prediction, and control of structures and behaviors of networks. Another trans-disciplinary focus of the IEEE Transactions on Network Science and Engineering is the interactions between and co-evolution of different genres of networks.