David Groß, M. Klauck, Timo P. Gros, Marcel Steinmetz, Jörg Hoffmann, S. Gumhold
{"title":"基于q - learning的赛马场行动策略集合的字形可视化分析","authors":"David Groß, M. Klauck, Timo P. Gros, Marcel Steinmetz, Jörg Hoffmann, S. Gumhold","doi":"10.1109/IV56949.2022.00011","DOIUrl":null,"url":null,"abstract":"Recently, deep reinforcement learning has become very successful in making complex decisions, achieving super-human performance in Go, chess, and challenging video games. When applied to safety-critical applications, however, like the control of cyber-physical systems with a learned action policy, the need for certification arises. To empower domain experts to decide whether to trust a learned action policy, we propose visualization methods for a detailed assessment of action policies implemented as neural networks trained with Q-learning. We propose a highly responsive visual analysis tool that fosters efficient analysis of Q-learning based action policies over the complete state space of the system, which is essential for verification and gaining detailed insights on policy quality. For efficient visual inspection of the per-action Q-value rating over the state space, we designed three glyphs that provide different levels of detail. In particular, we introduce the two-dimensional Q-Glyph that visually encodes Q-values in a compact manner while preserving directional information of the actions. Placing glyphs in ordered stacks allows for simultaneous inspection of policy ensembles, that for example result from Q-learning meta parameter studies. Further analysis of the policy is supported by enabling inspection of individual traces generated from a chosen start state. A user study was conducted to evaluate the effectiveness of our tool applied to the Racetrack case study, which is a commonly used benchmark in the AI community abstracting driving control.","PeriodicalId":153161,"journal":{"name":"2022 26th International Conference Information Visualisation (IV)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Glyph-Based Visual Analysis of Q-Leaning Based Action Policy Ensembles on Racetrack\",\"authors\":\"David Groß, M. Klauck, Timo P. Gros, Marcel Steinmetz, Jörg Hoffmann, S. Gumhold\",\"doi\":\"10.1109/IV56949.2022.00011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, deep reinforcement learning has become very successful in making complex decisions, achieving super-human performance in Go, chess, and challenging video games. When applied to safety-critical applications, however, like the control of cyber-physical systems with a learned action policy, the need for certification arises. To empower domain experts to decide whether to trust a learned action policy, we propose visualization methods for a detailed assessment of action policies implemented as neural networks trained with Q-learning. We propose a highly responsive visual analysis tool that fosters efficient analysis of Q-learning based action policies over the complete state space of the system, which is essential for verification and gaining detailed insights on policy quality. For efficient visual inspection of the per-action Q-value rating over the state space, we designed three glyphs that provide different levels of detail. In particular, we introduce the two-dimensional Q-Glyph that visually encodes Q-values in a compact manner while preserving directional information of the actions. Placing glyphs in ordered stacks allows for simultaneous inspection of policy ensembles, that for example result from Q-learning meta parameter studies. Further analysis of the policy is supported by enabling inspection of individual traces generated from a chosen start state. A user study was conducted to evaluate the effectiveness of our tool applied to the Racetrack case study, which is a commonly used benchmark in the AI community abstracting driving control.\",\"PeriodicalId\":153161,\"journal\":{\"name\":\"2022 26th International Conference Information Visualisation (IV)\",\"volume\":\"135 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 26th International Conference Information Visualisation (IV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IV56949.2022.00011\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 26th International Conference Information Visualisation (IV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IV56949.2022.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Glyph-Based Visual Analysis of Q-Leaning Based Action Policy Ensembles on Racetrack
Recently, deep reinforcement learning has become very successful in making complex decisions, achieving super-human performance in Go, chess, and challenging video games. When applied to safety-critical applications, however, like the control of cyber-physical systems with a learned action policy, the need for certification arises. To empower domain experts to decide whether to trust a learned action policy, we propose visualization methods for a detailed assessment of action policies implemented as neural networks trained with Q-learning. We propose a highly responsive visual analysis tool that fosters efficient analysis of Q-learning based action policies over the complete state space of the system, which is essential for verification and gaining detailed insights on policy quality. For efficient visual inspection of the per-action Q-value rating over the state space, we designed three glyphs that provide different levels of detail. In particular, we introduce the two-dimensional Q-Glyph that visually encodes Q-values in a compact manner while preserving directional information of the actions. Placing glyphs in ordered stacks allows for simultaneous inspection of policy ensembles, that for example result from Q-learning meta parameter studies. Further analysis of the policy is supported by enabling inspection of individual traces generated from a chosen start state. A user study was conducted to evaluate the effectiveness of our tool applied to the Racetrack case study, which is a commonly used benchmark in the AI community abstracting driving control.