{"title":"基于多奖励资源的深度强化学习交通信号控制","authors":"Dunhao Zhong, A. Boukerche","doi":"10.1145/3345860.3361522","DOIUrl":null,"url":null,"abstract":"Intelligent traffic signal control is an effective way to solve the traffic congestion problem in the real world. One trend is to use Deep Reinforcement Learning (DRL) to control traffic signals based on the snapshots of traffic states. While most of the research used single numeric reward to frame multiple objectives, such as minimizing waiting time and waiting queue length, they overlooked that one reward for multiple objectives misleads agents taking wrong actions in certain states, which causes following traffic fluctuation. In this paper, we propose a DRL-based framework that uses multiple rewards for multiple objectives. Our framework aims to solve the difficulty of assessing behaviours by single numeric reward and control traffic flows more steadily. We evaluated our approach on both synthetic traffic scenarios and a real-world traffic dataset in Toronto. The results show that our approach outperformed single reward-based approaches.","PeriodicalId":55557,"journal":{"name":"Ad Hoc & Sensor Wireless Networks","volume":null,"pages":null},"PeriodicalIF":0.6000,"publicationDate":"2019-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Traffic Signal Control Using Deep Reinforcement Learning with Multiple Resources of Rewards\",\"authors\":\"Dunhao Zhong, A. Boukerche\",\"doi\":\"10.1145/3345860.3361522\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Intelligent traffic signal control is an effective way to solve the traffic congestion problem in the real world. One trend is to use Deep Reinforcement Learning (DRL) to control traffic signals based on the snapshots of traffic states. While most of the research used single numeric reward to frame multiple objectives, such as minimizing waiting time and waiting queue length, they overlooked that one reward for multiple objectives misleads agents taking wrong actions in certain states, which causes following traffic fluctuation. In this paper, we propose a DRL-based framework that uses multiple rewards for multiple objectives. Our framework aims to solve the difficulty of assessing behaviours by single numeric reward and control traffic flows more steadily. We evaluated our approach on both synthetic traffic scenarios and a real-world traffic dataset in Toronto. The results show that our approach outperformed single reward-based approaches.\",\"PeriodicalId\":55557,\"journal\":{\"name\":\"Ad Hoc & Sensor Wireless Networks\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2019-11-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ad Hoc & Sensor Wireless Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3345860.3361522\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ad Hoc & Sensor Wireless Networks","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3345860.3361522","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Traffic Signal Control Using Deep Reinforcement Learning with Multiple Resources of Rewards
Intelligent traffic signal control is an effective way to solve the traffic congestion problem in the real world. One trend is to use Deep Reinforcement Learning (DRL) to control traffic signals based on the snapshots of traffic states. While most of the research used single numeric reward to frame multiple objectives, such as minimizing waiting time and waiting queue length, they overlooked that one reward for multiple objectives misleads agents taking wrong actions in certain states, which causes following traffic fluctuation. In this paper, we propose a DRL-based framework that uses multiple rewards for multiple objectives. Our framework aims to solve the difficulty of assessing behaviours by single numeric reward and control traffic flows more steadily. We evaluated our approach on both synthetic traffic scenarios and a real-world traffic dataset in Toronto. The results show that our approach outperformed single reward-based approaches.
期刊介绍:
Ad Hoc & Sensor Wireless Networks seeks to provide an opportunity for researchers from computer science, engineering and mathematical backgrounds to disseminate and exchange knowledge in the rapidly emerging field of ad hoc and sensor wireless networks. It will comprehensively cover physical, data-link, network and transport layers, as well as application, security, simulation and power management issues in sensor, local area, satellite, vehicular, personal, and mobile ad hoc networks.