Qi Liu , Yujie Tang , Xueyuan Li , Kaifeng Wang , Fan Yang , Zirui Li
{"title":"好奇心驱动的强化学习与图形转换器,用于联网和自动驾驶汽车的决策","authors":"Qi Liu , Yujie Tang , Xueyuan Li , Kaifeng Wang , Fan Yang , Zirui Li","doi":"10.1016/j.trc.2025.105183","DOIUrl":null,"url":null,"abstract":"<div><div>Cooperative decision-making technology for connected and autonomous vehicles (CAVs) in mixed autonomy traffic is critical for the advancement of modern intelligent transportation systems. Recently, graph reinforcement learning (GRL) approaches have shown remarkable success in addressing decision-making challenges by leveraging graph-based technologies. However, existing GRL-based research faces substantial challenges in generating accurate feature embeddings to enhance driving policies, thoroughly exploring the driving environment, and efficiently training models. To address these challenges, this paper proposes a graph transformer reinforcement learning method with a distributional curiosity mechanism to improve the feature generation efficiency and environment exploration, ultimately boosting the decision-making performance of CAVs. First, an improved transformed graph convolutional network (ITransGCN) is proposed, integrating graph convolutional network (GCN), rotary position encoding method (ROPE), and temporal prior attention mechanism to strengthen sequential modeling capabilities, thereby generating informative spatial–temporal feature embeddings. Then, a curiosity mechanism based on distributional random network distillation (DRND) is proposed to enhance the exploratory capabilities of CAVs in driving environments. Additionally, a temporal integrated deep reinforcement learning (TI-DRL) model is developed, incorporating an auxiliary loss that integrates spatial–temporal information to improve the model’s ability to capture the spatial–temporal dependencies. Finally, a cooperation-aware reward function is constructed to further evaluate the performance of CAVs. Comprehensive experiments are conducted across three representative traffic scenarios to validate the proposed method. The results demonstrate that our proposed method outperforms the baselines in driving safety, efficiency, and model stability, highlighting the effectiveness of the core components and the generalization capability of the proposed method.</div></div>","PeriodicalId":54417,"journal":{"name":"Transportation Research Part C-Emerging Technologies","volume":"177 ","pages":"Article 105183"},"PeriodicalIF":7.6000,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Curiosity-driven reinforcement learning with graph transformers for decision-making in connected and autonomous vehicles\",\"authors\":\"Qi Liu , Yujie Tang , Xueyuan Li , Kaifeng Wang , Fan Yang , Zirui Li\",\"doi\":\"10.1016/j.trc.2025.105183\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Cooperative decision-making technology for connected and autonomous vehicles (CAVs) in mixed autonomy traffic is critical for the advancement of modern intelligent transportation systems. Recently, graph reinforcement learning (GRL) approaches have shown remarkable success in addressing decision-making challenges by leveraging graph-based technologies. However, existing GRL-based research faces substantial challenges in generating accurate feature embeddings to enhance driving policies, thoroughly exploring the driving environment, and efficiently training models. To address these challenges, this paper proposes a graph transformer reinforcement learning method with a distributional curiosity mechanism to improve the feature generation efficiency and environment exploration, ultimately boosting the decision-making performance of CAVs. First, an improved transformed graph convolutional network (ITransGCN) is proposed, integrating graph convolutional network (GCN), rotary position encoding method (ROPE), and temporal prior attention mechanism to strengthen sequential modeling capabilities, thereby generating informative spatial–temporal feature embeddings. Then, a curiosity mechanism based on distributional random network distillation (DRND) is proposed to enhance the exploratory capabilities of CAVs in driving environments. Additionally, a temporal integrated deep reinforcement learning (TI-DRL) model is developed, incorporating an auxiliary loss that integrates spatial–temporal information to improve the model’s ability to capture the spatial–temporal dependencies. Finally, a cooperation-aware reward function is constructed to further evaluate the performance of CAVs. Comprehensive experiments are conducted across three representative traffic scenarios to validate the proposed method. The results demonstrate that our proposed method outperforms the baselines in driving safety, efficiency, and model stability, highlighting the effectiveness of the core components and the generalization capability of the proposed method.</div></div>\",\"PeriodicalId\":54417,\"journal\":{\"name\":\"Transportation Research Part C-Emerging Technologies\",\"volume\":\"177 \",\"pages\":\"Article 105183\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-06-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transportation Research Part C-Emerging Technologies\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0968090X25001871\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"TRANSPORTATION SCIENCE & TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part C-Emerging Technologies","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0968090X25001871","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}
Curiosity-driven reinforcement learning with graph transformers for decision-making in connected and autonomous vehicles
Cooperative decision-making technology for connected and autonomous vehicles (CAVs) in mixed autonomy traffic is critical for the advancement of modern intelligent transportation systems. Recently, graph reinforcement learning (GRL) approaches have shown remarkable success in addressing decision-making challenges by leveraging graph-based technologies. However, existing GRL-based research faces substantial challenges in generating accurate feature embeddings to enhance driving policies, thoroughly exploring the driving environment, and efficiently training models. To address these challenges, this paper proposes a graph transformer reinforcement learning method with a distributional curiosity mechanism to improve the feature generation efficiency and environment exploration, ultimately boosting the decision-making performance of CAVs. First, an improved transformed graph convolutional network (ITransGCN) is proposed, integrating graph convolutional network (GCN), rotary position encoding method (ROPE), and temporal prior attention mechanism to strengthen sequential modeling capabilities, thereby generating informative spatial–temporal feature embeddings. Then, a curiosity mechanism based on distributional random network distillation (DRND) is proposed to enhance the exploratory capabilities of CAVs in driving environments. Additionally, a temporal integrated deep reinforcement learning (TI-DRL) model is developed, incorporating an auxiliary loss that integrates spatial–temporal information to improve the model’s ability to capture the spatial–temporal dependencies. Finally, a cooperation-aware reward function is constructed to further evaluate the performance of CAVs. Comprehensive experiments are conducted across three representative traffic scenarios to validate the proposed method. The results demonstrate that our proposed method outperforms the baselines in driving safety, efficiency, and model stability, highlighting the effectiveness of the core components and the generalization capability of the proposed method.
期刊介绍:
Transportation Research: Part C (TR_C) is dedicated to showcasing high-quality, scholarly research that delves into the development, applications, and implications of transportation systems and emerging technologies. Our focus lies not solely on individual technologies, but rather on their broader implications for the planning, design, operation, control, maintenance, and rehabilitation of transportation systems, services, and components. In essence, the intellectual core of the journal revolves around the transportation aspect rather than the technology itself. We actively encourage the integration of quantitative methods from diverse fields such as operations research, control systems, complex networks, computer science, and artificial intelligence. Join us in exploring the intersection of transportation systems and emerging technologies to drive innovation and progress in the field.