基于层次深度强化学习的不确定电动客车充电调度优化

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Internet of Things Journal Pub Date : 2025-03-18 DOI:10.1109/JIOT.2025.3552532

Jiaju Qi;Lei Lei;Thorsteinn Jonsson;Dusit Niyato

{"title":"基于层次深度强化学习的不确定电动客车充电调度优化","authors":"Jiaju Qi;Lei Lei;Thorsteinn Jonsson;Dusit Niyato","doi":"10.1109/JIOT.2025.3552532","DOIUrl":null,"url":null,"abstract":"The growing adoption of electric buses (EBs) represents a significant step toward sustainable development. By utilizing Internet of Things (IoT) systems, charging stations can autonomously determine charging schedules based on real-time data. However, optimizing EB charging schedules remains a critical challenge due to uncertainties in travel time, energy consumption, and fluctuating electricity prices. Moreover, to address real-world complexities, charging policies must make decisions efficiently across multiple time scales and remain scalable for large EB fleets. In this article, we propose a hierarchical deep reinforcement learning (HDRL) approach that reformulates the original Markov decision process (MDP) into two augmented MDPs. To solve these MDPs and enable multitimescale decision-making, we introduce a novel HDRL algorithm, namely, double actor-critic multiagent proximal policy optimization enhancement (DAC-MAPPO-E). Scalability challenges of the double actor-critic (DAC) algorithm for large-scale EB fleets are addressed through enhancements at both decision levels. At the high level, we redesign the decentralized actor network and integrate an attention mechanism to extract relevant global state information for each EB, decreasing the size of neural networks. At the low level, the multiagent proximal policy optimization (MAPPO) algorithm is incorporated into the DAC framework, enabling decentralized and coordinated charging power decisions, reducing computational complexity and enhancing convergence speed. Extensive experiments with real-world data demonstrate the superior performance and scalability of DAC-MAPPO-E in optimizing EB fleet charging schedules.","PeriodicalId":54347,"journal":{"name":"IEEE Internet of Things Journal","volume":"12 12","pages":"22427-22442"},"PeriodicalIF":8.9000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimizing Electric Bus Charging Scheduling With Uncertainties Using Hierarchical Deep Reinforcement Learning\",\"authors\":\"Jiaju Qi;Lei Lei;Thorsteinn Jonsson;Dusit Niyato\",\"doi\":\"10.1109/JIOT.2025.3552532\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The growing adoption of electric buses (EBs) represents a significant step toward sustainable development. By utilizing Internet of Things (IoT) systems, charging stations can autonomously determine charging schedules based on real-time data. However, optimizing EB charging schedules remains a critical challenge due to uncertainties in travel time, energy consumption, and fluctuating electricity prices. Moreover, to address real-world complexities, charging policies must make decisions efficiently across multiple time scales and remain scalable for large EB fleets. In this article, we propose a hierarchical deep reinforcement learning (HDRL) approach that reformulates the original Markov decision process (MDP) into two augmented MDPs. To solve these MDPs and enable multitimescale decision-making, we introduce a novel HDRL algorithm, namely, double actor-critic multiagent proximal policy optimization enhancement (DAC-MAPPO-E). Scalability challenges of the double actor-critic (DAC) algorithm for large-scale EB fleets are addressed through enhancements at both decision levels. At the high level, we redesign the decentralized actor network and integrate an attention mechanism to extract relevant global state information for each EB, decreasing the size of neural networks. At the low level, the multiagent proximal policy optimization (MAPPO) algorithm is incorporated into the DAC framework, enabling decentralized and coordinated charging power decisions, reducing computational complexity and enhancing convergence speed. Extensive experiments with real-world data demonstrate the superior performance and scalability of DAC-MAPPO-E in optimizing EB fleet charging schedules.\",\"PeriodicalId\":54347,\"journal\":{\"name\":\"IEEE Internet of Things Journal\",\"volume\":\"12 12\",\"pages\":\"22427-22442\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-03-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Internet of Things Journal\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10930901/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Internet of Things Journal","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10930901/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

越来越多的电动公交车（EBs）的采用是朝着可持续发展迈出的重要一步。通过物联网（IoT）系统，充电站可以根据实时数据自主确定充电计划。然而，由于行驶时间、能源消耗和电价波动的不确定性，优化电动汽车充电计划仍然是一个严峻的挑战。此外，为了解决现实世界的复杂性，收费政策必须在多个时间尺度上有效地做出决策，并保持大型电动汽车车队的可扩展性。在本文中，我们提出了一种分层深度强化学习（HDRL）方法，该方法将原始的马尔可夫决策过程（MDP）重新表述为两个增强的MDP。为了解决这些mdp问题并实现多时间尺度决策，我们引入了一种新的HDRL算法，即双行为主体-批评多智能体近端策略优化增强（DAC-MAPPO-E）。通过在两个决策级别上的增强，解决了大规模EB车队的双参与者-评论家（DAC）算法的可扩展性挑战。在高层次上，我们重新设计了分散的行动者网络，并集成了一个关注机制来提取每个EB的相关全局状态信息，从而减小了神经网络的规模。在低层次上，将多智能体近端策略优化（MAPPO）算法纳入DAC框架，实现了分散协调的充电功率决策，降低了计算复杂度，提高了收敛速度。大量的实际数据实验表明，DAC-MAPPO-E在优化电动汽车充电计划方面具有卓越的性能和可扩展性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Optimizing Electric Bus Charging Scheduling With Uncertainties Using Hierarchical Deep Reinforcement Learning

The growing adoption of electric buses (EBs) represents a significant step toward sustainable development. By utilizing Internet of Things (IoT) systems, charging stations can autonomously determine charging schedules based on real-time data. However, optimizing EB charging schedules remains a critical challenge due to uncertainties in travel time, energy consumption, and fluctuating electricity prices. Moreover, to address real-world complexities, charging policies must make decisions efficiently across multiple time scales and remain scalable for large EB fleets. In this article, we propose a hierarchical deep reinforcement learning (HDRL) approach that reformulates the original Markov decision process (MDP) into two augmented MDPs. To solve these MDPs and enable multitimescale decision-making, we introduce a novel HDRL algorithm, namely, double actor-critic multiagent proximal policy optimization enhancement (DAC-MAPPO-E). Scalability challenges of the double actor-critic (DAC) algorithm for large-scale EB fleets are addressed through enhancements at both decision levels. At the high level, we redesign the decentralized actor network and integrate an attention mechanism to extract relevant global state information for each EB, decreasing the size of neural networks. At the low level, the multiagent proximal policy optimization (MAPPO) algorithm is incorporated into the DAC framework, enabling decentralized and coordinated charging power decisions, reducing computational complexity and enhancing convergence speed. Extensive experiments with real-world data demonstrate the superior performance and scalability of DAC-MAPPO-E in optimizing EB fleet charging schedules.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Internet of Things Journal Computer Science-Information Systems

CiteScore

17.60

自引率

13.20%

发文量

1982

期刊介绍： The EEE Internet of Things (IoT) Journal publishes articles and review articles covering various aspects of IoT, including IoT system architecture, IoT enabling technologies, IoT communication and networking protocols such as network coding, and IoT services and applications. Topics encompass IoT's impacts on sensor technologies, big data management, and future internet design for applications like smart cities and smart homes. Fields of interest include IoT architecture such as things-centric, data-centric, service-oriented IoT architecture; IoT enabling technologies and systematic integration such as sensor technologies, big sensor data management, and future Internet design for IoT; IoT services, applications, and test-beds such as IoT service middleware, IoT application programming interface (API), IoT application design, and IoT trials/experiments; IoT standardization activities and technology development in different standard development organizations (SDO) such as IEEE, IETF, ITU, 3GPP, ETSI, etc.