{"title":"Learning the optimal state-feedback using deep networks","authors":"Carlos Sánchez-Sánchez, D. Izzo, Daniel Hennes","doi":"10.1109/SSCI.2016.7850105","DOIUrl":null,"url":null,"abstract":"We investigate the use of deep artificial neural networks to approximate the optimal state-feedback control of continuous time, deterministic, non-linear systems. The networks are trained in a supervised manner using trajectories generated by solving the optimal control problem via the Hermite-Simpson transcription method. We find that deep networks are able to represent the optimal state-feedback with high accuracy and precision well outside the training area. We consider non-linear dynamical models under different cost functions that result in both smooth and discontinuous (bang-bang) optimal control solutions. In particular, we investigate the inverted pendulum swing-up and stabilization, a multicopter pin-point landing and a spacecraft free landing problem. Across all domains, we find that deep networks significantly outperform shallow networks in the ability to build an accurate functional representation of the optimal control. In the case of spacecraft and multicopter landing, deep networks are able to achieve safe landings consistently even when starting well outside of the training area.","PeriodicalId":120288,"journal":{"name":"2016 IEEE Symposium Series on Computational Intelligence (SSCI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Symposium Series on Computational Intelligence (SSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSCI.2016.7850105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26
Abstract
We investigate the use of deep artificial neural networks to approximate the optimal state-feedback control of continuous time, deterministic, non-linear systems. The networks are trained in a supervised manner using trajectories generated by solving the optimal control problem via the Hermite-Simpson transcription method. We find that deep networks are able to represent the optimal state-feedback with high accuracy and precision well outside the training area. We consider non-linear dynamical models under different cost functions that result in both smooth and discontinuous (bang-bang) optimal control solutions. In particular, we investigate the inverted pendulum swing-up and stabilization, a multicopter pin-point landing and a spacecraft free landing problem. Across all domains, we find that deep networks significantly outperform shallow networks in the ability to build an accurate functional representation of the optimal control. In the case of spacecraft and multicopter landing, deep networks are able to achieve safe landings consistently even when starting well outside of the training area.