An exploration-driven framework for path planning in complex buildings using improved MADDPG

IF 6.7 2区工程技术 Q1 CONSTRUCTION & BUILDING TECHNOLOGY

Journal of building engineering Pub Date : 2025-04-11 DOI:10.1016/j.jobe.2025.112626

Chong Zhang , Hong Liu , Wenhao Li

{"title":"An exploration-driven framework for path planning in complex buildings using improved MADDPG","authors":"Chong Zhang , Hong Liu , Wenhao Li","doi":"10.1016/j.jobe.2025.112626","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-agent deep reinforcement learning (MADRL) methods have been extensively applied to crowd evacuation in complex building environments. However, the intricate architecture of modern buildings and high population densities hinder agent exploration efficiency, while existing approaches struggle to overcome the challenges posed by sparse rewards. To tackle these issues, this study proposes the Intrinsic Curiosity Distillation Multi-Agent Deep Deterministic Policy Gradient (ICD-MADDPG) algorithm, an exploration-driven framework for path planning in complex buildings. First, the ICD-MADDPG algorithm introduces a curiosity mechanism by integrating the Intrinsic Curiosity Module (ICM) and Random Network Distillation (RND), thereby refining the reward mechanism and significantly enhancing exploration efficiency. Next, the feature extraction process is enhanced using a Long Short-Term Memory (LSTM) network, enabling the model to effectively capture temporal dependencies in dynamic environments. Finally, a two-layer evacuation mechanism is adopted, where the crowd is divided into groups consisting of leaders and followers. Leaders utilize the ICD-MADDPG algorithm for global evacuation path planning, while followers employ the Reciprocal Velocity Obstacle (RVO) algorithm to follow the leaders and avoid collisions efficiently. Experimental results demonstrate that the ICD-MADDPG algorithm achieves superior rewards, improves evacuation efficiency, and effectively mitigates congestion. This framework provides a robust theoretical basis for optimizing evacuation strategies and offers practical value for intelligent building systems and emergency response planning.</div></div>","PeriodicalId":15064,"journal":{"name":"Journal of building engineering","volume":"107 ","pages":"Article 112626"},"PeriodicalIF":6.7000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of building engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352710225008630","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CONSTRUCTION & BUILDING TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Multi-agent deep reinforcement learning (MADRL) methods have been extensively applied to crowd evacuation in complex building environments. However, the intricate architecture of modern buildings and high population densities hinder agent exploration efficiency, while existing approaches struggle to overcome the challenges posed by sparse rewards. To tackle these issues, this study proposes the Intrinsic Curiosity Distillation Multi-Agent Deep Deterministic Policy Gradient (ICD-MADDPG) algorithm, an exploration-driven framework for path planning in complex buildings. First, the ICD-MADDPG algorithm introduces a curiosity mechanism by integrating the Intrinsic Curiosity Module (ICM) and Random Network Distillation (RND), thereby refining the reward mechanism and significantly enhancing exploration efficiency. Next, the feature extraction process is enhanced using a Long Short-Term Memory (LSTM) network, enabling the model to effectively capture temporal dependencies in dynamic environments. Finally, a two-layer evacuation mechanism is adopted, where the crowd is divided into groups consisting of leaders and followers. Leaders utilize the ICD-MADDPG algorithm for global evacuation path planning, while followers employ the Reciprocal Velocity Obstacle (RVO) algorithm to follow the leaders and avoid collisions efficiently. Experimental results demonstrate that the ICD-MADDPG algorithm achieves superior rewards, improves evacuation efficiency, and effectively mitigates congestion. This framework provides a robust theoretical basis for optimizing evacuation strategies and offers practical value for intelligent building systems and emergency response planning.

查看原文本刊更多论文

基于改进MADDPG的复杂建筑路径规划探索驱动框架

多智能体深度强化学习（MADRL）方法已广泛应用于复杂建筑环境中的人群疏散。然而，复杂的现代建筑结构和高人口密度阻碍了智能体的探索效率，而现有的方法难以克服稀疏奖励带来的挑战。为了解决这些问题，本研究提出了一种探索驱动的复杂建筑路径规划框架——内在好奇心蒸馏多智能体深度确定性策略梯度（ICD-MADDPG）算法。首先，ICD-MADDPG算法通过整合内在好奇心模块（Intrinsic curiosity Module， ICM）和随机网络蒸馏（Random Network Distillation， RND）引入好奇心机制，细化奖励机制，显著提高探索效率。其次，使用长短期记忆（LSTM）网络增强特征提取过程，使模型能够有效地捕获动态环境中的时间依赖性。最后，采用双层疏散机制，将人群分为领导者和追随者两组。leader使用ICD-MADDPG算法进行全局疏散路径规划，follower使用反向速度障碍（Reciprocal Velocity Obstacle， RVO）算法跟随leader并有效避免碰撞。实验结果表明，ICD-MADDPG算法获得了较好的奖励，提高了疏散效率，有效缓解了拥堵。该框架为优化疏散策略提供了坚实的理论基础，并为智能建筑系统和应急响应规划提供了实用价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of building engineering Engineering-Civil and Structural Engineering

CiteScore

10.00

自引率

12.50%

发文量

1901

审稿时长

35 days

期刊介绍： The Journal of Building Engineering is an interdisciplinary journal that covers all aspects of science and technology concerned with the whole life cycle of the built environment; from the design phase through to construction, operation, performance, maintenance and its deterioration.