Milica Gasic, F. Lefèvre, Filip Jurcícek, Simon Keizer, François Mairesse, Blaise Thomson, Kai Yu, S. Young
{"title":"基于空间的POMDP对话系统中的后退操作选择","authors":"Milica Gasic, F. Lefèvre, Filip Jurcícek, Simon Keizer, François Mairesse, Blaise Thomson, Kai Yu, S. Young","doi":"10.1109/ASRU.2009.5373416","DOIUrl":null,"url":null,"abstract":"This paper deals with the issue of invalid state-action pairs in the Partially Observable Markov Decision Process (POMDP) framework, with a focus on real-world tasks where the need for approximate solutions exacerbates this problem. In particular, when modelling dialogue as a POMDP, both the state and the action space must be reduced to smaller scale summary spaces in order to make learning tractable. However, since not all actions are valid in all states, the action proposed by the policy in summary space sometimes leads to an invalid action when mapped back to master space. Some form of back-off scheme must then be used to generate an alternative action. This paper demonstrates how the value function derived during reinforcement learning can be used to order back-off actions in an N-best list. Compared to a simple baseline back-off strategy and to a strategy that extends the summary space to minimise the occurrence of invalid actions, the proposed N-best action selection scheme is shown to be significantly more robust.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"93 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Back-off action selection in summary space-based POMDP dialogue systems\",\"authors\":\"Milica Gasic, F. Lefèvre, Filip Jurcícek, Simon Keizer, François Mairesse, Blaise Thomson, Kai Yu, S. Young\",\"doi\":\"10.1109/ASRU.2009.5373416\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper deals with the issue of invalid state-action pairs in the Partially Observable Markov Decision Process (POMDP) framework, with a focus on real-world tasks where the need for approximate solutions exacerbates this problem. In particular, when modelling dialogue as a POMDP, both the state and the action space must be reduced to smaller scale summary spaces in order to make learning tractable. However, since not all actions are valid in all states, the action proposed by the policy in summary space sometimes leads to an invalid action when mapped back to master space. Some form of back-off scheme must then be used to generate an alternative action. This paper demonstrates how the value function derived during reinforcement learning can be used to order back-off actions in an N-best list. Compared to a simple baseline back-off strategy and to a strategy that extends the summary space to minimise the occurrence of invalid actions, the proposed N-best action selection scheme is shown to be significantly more robust.\",\"PeriodicalId\":292194,\"journal\":{\"name\":\"2009 IEEE Workshop on Automatic Speech Recognition & Understanding\",\"volume\":\"93 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE Workshop on Automatic Speech Recognition & Understanding\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2009.5373416\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2009.5373416","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Back-off action selection in summary space-based POMDP dialogue systems
This paper deals with the issue of invalid state-action pairs in the Partially Observable Markov Decision Process (POMDP) framework, with a focus on real-world tasks where the need for approximate solutions exacerbates this problem. In particular, when modelling dialogue as a POMDP, both the state and the action space must be reduced to smaller scale summary spaces in order to make learning tractable. However, since not all actions are valid in all states, the action proposed by the policy in summary space sometimes leads to an invalid action when mapped back to master space. Some form of back-off scheme must then be used to generate an alternative action. This paper demonstrates how the value function derived during reinforcement learning can be used to order back-off actions in an N-best list. Compared to a simple baseline back-off strategy and to a strategy that extends the summary space to minimise the occurrence of invalid actions, the proposed N-best action selection scheme is shown to be significantly more robust.