An exploration-driven framework for path planning in complex buildings using improved MADDPG

IF 6.7 2区 工程技术 Q1 CONSTRUCTION & BUILDING TECHNOLOGY
Chong Zhang , Hong Liu , Wenhao Li
{"title":"An exploration-driven framework for path planning in complex buildings using improved MADDPG","authors":"Chong Zhang ,&nbsp;Hong Liu ,&nbsp;Wenhao Li","doi":"10.1016/j.jobe.2025.112626","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-agent deep reinforcement learning (MADRL) methods have been extensively applied to crowd evacuation in complex building environments. However, the intricate architecture of modern buildings and high population densities hinder agent exploration efficiency, while existing approaches struggle to overcome the challenges posed by sparse rewards. To tackle these issues, this study proposes the Intrinsic Curiosity Distillation Multi-Agent Deep Deterministic Policy Gradient (ICD-MADDPG) algorithm, an exploration-driven framework for path planning in complex buildings. First, the ICD-MADDPG algorithm introduces a curiosity mechanism by integrating the Intrinsic Curiosity Module (ICM) and Random Network Distillation (RND), thereby refining the reward mechanism and significantly enhancing exploration efficiency. Next, the feature extraction process is enhanced using a Long Short-Term Memory (LSTM) network, enabling the model to effectively capture temporal dependencies in dynamic environments. Finally, a two-layer evacuation mechanism is adopted, where the crowd is divided into groups consisting of leaders and followers. Leaders utilize the ICD-MADDPG algorithm for global evacuation path planning, while followers employ the Reciprocal Velocity Obstacle (RVO) algorithm to follow the leaders and avoid collisions efficiently. Experimental results demonstrate that the ICD-MADDPG algorithm achieves superior rewards, improves evacuation efficiency, and effectively mitigates congestion. This framework provides a robust theoretical basis for optimizing evacuation strategies and offers practical value for intelligent building systems and emergency response planning.</div></div>","PeriodicalId":15064,"journal":{"name":"Journal of building engineering","volume":"107 ","pages":"Article 112626"},"PeriodicalIF":6.7000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of building engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352710225008630","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CONSTRUCTION & BUILDING TECHNOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Multi-agent deep reinforcement learning (MADRL) methods have been extensively applied to crowd evacuation in complex building environments. However, the intricate architecture of modern buildings and high population densities hinder agent exploration efficiency, while existing approaches struggle to overcome the challenges posed by sparse rewards. To tackle these issues, this study proposes the Intrinsic Curiosity Distillation Multi-Agent Deep Deterministic Policy Gradient (ICD-MADDPG) algorithm, an exploration-driven framework for path planning in complex buildings. First, the ICD-MADDPG algorithm introduces a curiosity mechanism by integrating the Intrinsic Curiosity Module (ICM) and Random Network Distillation (RND), thereby refining the reward mechanism and significantly enhancing exploration efficiency. Next, the feature extraction process is enhanced using a Long Short-Term Memory (LSTM) network, enabling the model to effectively capture temporal dependencies in dynamic environments. Finally, a two-layer evacuation mechanism is adopted, where the crowd is divided into groups consisting of leaders and followers. Leaders utilize the ICD-MADDPG algorithm for global evacuation path planning, while followers employ the Reciprocal Velocity Obstacle (RVO) algorithm to follow the leaders and avoid collisions efficiently. Experimental results demonstrate that the ICD-MADDPG algorithm achieves superior rewards, improves evacuation efficiency, and effectively mitigates congestion. This framework provides a robust theoretical basis for optimizing evacuation strategies and offers practical value for intelligent building systems and emergency response planning.
基于改进MADDPG的复杂建筑路径规划探索驱动框架
多智能体深度强化学习(MADRL)方法已广泛应用于复杂建筑环境中的人群疏散。然而,复杂的现代建筑结构和高人口密度阻碍了智能体的探索效率,而现有的方法难以克服稀疏奖励带来的挑战。为了解决这些问题,本研究提出了一种探索驱动的复杂建筑路径规划框架——内在好奇心蒸馏多智能体深度确定性策略梯度(ICD-MADDPG)算法。首先,ICD-MADDPG算法通过整合内在好奇心模块(Intrinsic curiosity Module, ICM)和随机网络蒸馏(Random Network Distillation, RND)引入好奇心机制,细化奖励机制,显著提高探索效率。其次,使用长短期记忆(LSTM)网络增强特征提取过程,使模型能够有效地捕获动态环境中的时间依赖性。最后,采用双层疏散机制,将人群分为领导者和追随者两组。leader使用ICD-MADDPG算法进行全局疏散路径规划,follower使用反向速度障碍(Reciprocal Velocity Obstacle, RVO)算法跟随leader并有效避免碰撞。实验结果表明,ICD-MADDPG算法获得了较好的奖励,提高了疏散效率,有效缓解了拥堵。该框架为优化疏散策略提供了坚实的理论基础,并为智能建筑系统和应急响应规划提供了实用价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of building engineering
Journal of building engineering Engineering-Civil and Structural Engineering
CiteScore
10.00
自引率
12.50%
发文量
1901
审稿时长
35 days
期刊介绍: The Journal of Building Engineering is an interdisciplinary journal that covers all aspects of science and technology concerned with the whole life cycle of the built environment; from the design phase through to construction, operation, performance, maintenance and its deterioration.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信