Task Partitioning and Scheduling Based on Stochastic Policy Gradient in Mobile Crowdsensing

IF 4.5 2区计算机科学 Q1 COMPUTER SCIENCE, CYBERNETICS

IEEE Transactions on Computational Social Systems Pub Date : 2024-06-06 DOI:10.1109/TCSS.2024.3398430

Tianjing Wang;Yu Zhang;Hang Shen;Guangwei Bai

{"title":"Task Partitioning and Scheduling Based on Stochastic Policy Gradient in Mobile Crowdsensing","authors":"Tianjing Wang;Yu Zhang;Hang Shen;Guangwei Bai","doi":"10.1109/TCSS.2024.3398430","DOIUrl":null,"url":null,"abstract":"Deep reinforcement learning (DRL) has become prevalent for decision-making task assignments in mobile crowdsensing (MCS). However, when facing sensing scenarios with varying numbers of workers or task attributes, existing DRL-based task assignment schemes fail to generate matching policies continuously and are susceptible to environmental fluctuations. To overcome these issues, a twin-delayed deep stochastic policy gradient (TDDS) approach is presented for balanced and low-latency MCS task decomposition and parallel subtask allocation. A masked attention mechanism is incorporated into the policy network to enable TDDS to adapt to task-attribute and subtask variations. To enhance environmental adaptability, an off-policy DRL algorithm incorporating experience replay is developed to eliminate sample correlation during training. Gumbel-Softmax sampling is integrated into the twin-delayed deep deterministic policy gradient (TD3) to support discrete action space decisions and a customized reward strategy to reduce task completion delay and balance workloads. Extensive simulation results confirm that the proposed scheme outperforms mainstream DRL baselines in terms of environmental adaptability, task completion delay, and workload balancing.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"11 5","pages":"6580-6591"},"PeriodicalIF":4.5000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Social Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10550173/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, CYBERNETICS","Score":null,"Total":0}

引用次数: 0

Abstract

Deep reinforcement learning (DRL) has become prevalent for decision-making task assignments in mobile crowdsensing (MCS). However, when facing sensing scenarios with varying numbers of workers or task attributes, existing DRL-based task assignment schemes fail to generate matching policies continuously and are susceptible to environmental fluctuations. To overcome these issues, a twin-delayed deep stochastic policy gradient (TDDS) approach is presented for balanced and low-latency MCS task decomposition and parallel subtask allocation. A masked attention mechanism is incorporated into the policy network to enable TDDS to adapt to task-attribute and subtask variations. To enhance environmental adaptability, an off-policy DRL algorithm incorporating experience replay is developed to eliminate sample correlation during training. Gumbel-Softmax sampling is integrated into the twin-delayed deep deterministic policy gradient (TD3) to support discrete action space decisions and a customized reward strategy to reduce task completion delay and balance workloads. Extensive simulation results confirm that the proposed scheme outperforms mainstream DRL baselines in terms of environmental adaptability, task completion delay, and workload balancing.

查看原文本刊更多论文

移动群感中基于随机策略梯度的任务分配和调度

深度强化学习（DRL）已成为移动群感（MCS）决策任务分配的常用方法。然而，当面对工人数量或任务属性各不相同的感知场景时，现有的基于 DRL 的任务分配方案无法持续生成匹配策略，而且容易受到环境波动的影响。为了克服这些问题，本文提出了一种双延迟深度随机策略梯度（TDDS）方法，用于均衡、低延迟的 MCS 任务分解和并行子任务分配。在策略网络中加入了掩蔽注意力机制，使 TDDS 能够适应任务属性和子任务的变化。为增强环境适应性，开发了一种包含经验重放的非策略 DRL 算法，以消除训练过程中的样本相关性。Gumbel-Softmax 采样被集成到双延迟深度确定性策略梯度（TD3）中，以支持离散行动空间决策和定制奖励策略，从而减少任务完成延迟并平衡工作量。广泛的仿真结果证实，所提出的方案在环境适应性、任务完成延迟和工作量平衡方面优于主流的 DRL 基线。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Computational Social Systems Social Sciences-Social Sciences (miscellaneous)

CiteScore

10.00

自引率

20.00%

发文量

316

期刊介绍： IEEE Transactions on Computational Social Systems focuses on such topics as modeling, simulation, analysis and understanding of social systems from the quantitative and/or computational perspective. "Systems" include man-man, man-machine and machine-machine organizations and adversarial situations as well as social media structures and their dynamics. More specifically, the proposed transactions publishes articles on modeling the dynamics of social systems, methodologies for incorporating and representing socio-cultural and behavioral aspects in computational modeling, analysis of social system behavior and structure, and paradigms for social systems modeling and simulation. The journal also features articles on social network dynamics, social intelligence and cognition, social systems design and architectures, socio-cultural modeling and representation, and computational behavior modeling, and their applications.