Zhenyu Zhang, Shaorong Xie, Han Zhang, Xiangfeng Luo, Hang Yu
{"title":"利用基于 Siamese-Q 的强化学习实现模拟到现实的零点转移","authors":"Zhenyu Zhang, Shaorong Xie, Han Zhang, Xiangfeng Luo, Hang Yu","doi":"10.1016/j.inffus.2024.102664","DOIUrl":null,"url":null,"abstract":"<div><p>To address real world decision problems in reinforcement learning, it is common to train a policy in a simulator first for safety. Unfortunately, the sim-real gap hinders effective simulation-to-real transfer without substantial training data. However, collecting real samples of complex tasks is often impractical, and the sample inefficiency of reinforcement learning exacerbates the simulation-to-real problem, even with online interaction or data. Representation learning can improve sample efficiency while keeping generalization by projecting high-dimensional inputs into low-dimensional representations. However, whether trained independently or simultaneously with reinforcement learning, representation learning remains a separate auxiliary task, lacking task-related features and generalization for simulation-to-real transfer. This paper proposes Siamese-Q, a new representation learning method employing Siamese networks and zero-shot simulation-to-real transfer, which narrows the distance between inputs with the same semantics in the latent space with respect to Q values. This allows us to fuse task-related information into the representation and improve the generalization of the policy. Evaluation in virtual and real autonomous vehicle scenarios demonstrates substantial improvements of 19.5% and 94.2% respectively over conventional representation learning, without requiring any real-world observations or on-policy interaction, and enabling reinforcement learning policies trained in simulations transfer to reality.</p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"114 ","pages":"Article 102664"},"PeriodicalIF":14.7000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Zero-shot sim-to-real transfer using Siamese-Q-Based reinforcement learning\",\"authors\":\"Zhenyu Zhang, Shaorong Xie, Han Zhang, Xiangfeng Luo, Hang Yu\",\"doi\":\"10.1016/j.inffus.2024.102664\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>To address real world decision problems in reinforcement learning, it is common to train a policy in a simulator first for safety. Unfortunately, the sim-real gap hinders effective simulation-to-real transfer without substantial training data. However, collecting real samples of complex tasks is often impractical, and the sample inefficiency of reinforcement learning exacerbates the simulation-to-real problem, even with online interaction or data. Representation learning can improve sample efficiency while keeping generalization by projecting high-dimensional inputs into low-dimensional representations. However, whether trained independently or simultaneously with reinforcement learning, representation learning remains a separate auxiliary task, lacking task-related features and generalization for simulation-to-real transfer. This paper proposes Siamese-Q, a new representation learning method employing Siamese networks and zero-shot simulation-to-real transfer, which narrows the distance between inputs with the same semantics in the latent space with respect to Q values. This allows us to fuse task-related information into the representation and improve the generalization of the policy. Evaluation in virtual and real autonomous vehicle scenarios demonstrates substantial improvements of 19.5% and 94.2% respectively over conventional representation learning, without requiring any real-world observations or on-policy interaction, and enabling reinforcement learning policies trained in simulations transfer to reality.</p></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"114 \",\"pages\":\"Article 102664\"},\"PeriodicalIF\":14.7000,\"publicationDate\":\"2024-09-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253524004421\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253524004421","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Zero-shot sim-to-real transfer using Siamese-Q-Based reinforcement learning
To address real world decision problems in reinforcement learning, it is common to train a policy in a simulator first for safety. Unfortunately, the sim-real gap hinders effective simulation-to-real transfer without substantial training data. However, collecting real samples of complex tasks is often impractical, and the sample inefficiency of reinforcement learning exacerbates the simulation-to-real problem, even with online interaction or data. Representation learning can improve sample efficiency while keeping generalization by projecting high-dimensional inputs into low-dimensional representations. However, whether trained independently or simultaneously with reinforcement learning, representation learning remains a separate auxiliary task, lacking task-related features and generalization for simulation-to-real transfer. This paper proposes Siamese-Q, a new representation learning method employing Siamese networks and zero-shot simulation-to-real transfer, which narrows the distance between inputs with the same semantics in the latent space with respect to Q values. This allows us to fuse task-related information into the representation and improve the generalization of the policy. Evaluation in virtual and real autonomous vehicle scenarios demonstrates substantial improvements of 19.5% and 94.2% respectively over conventional representation learning, without requiring any real-world observations or on-policy interaction, and enabling reinforcement learning policies trained in simulations transfer to reality.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.