{"title":"Representing the Reinforcement Learning state in a negotiation dialogue","authors":"P. Heeman","doi":"10.1109/ASRU.2009.5373413","DOIUrl":null,"url":null,"abstract":"Most applications of Reinforcement Learning (RL) for dialogue have focused on slot-filling tasks. In this paper, we explore a task that requires negotiation, in which conversants need to exchange information in order to decide on a good solution. We investigate what information should be included in the system's RL state so that an optimal policy can be learned and so that the state space stays reasonable in size. We propose keeping track of the decisions that the system has made, and using them to constrain the system's future behavior in the dialogue. In this way, we can compositionally represent the strategy that the system is employing. We show that this approach is able to learn a good policy for the task. This work is a first step to a more general exploration of applying RL to negotiation dialogues.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2009.5373413","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 27
Abstract
Most applications of Reinforcement Learning (RL) for dialogue have focused on slot-filling tasks. In this paper, we explore a task that requires negotiation, in which conversants need to exchange information in order to decide on a good solution. We investigate what information should be included in the system's RL state so that an optimal policy can be learned and so that the state space stays reasonable in size. We propose keeping track of the decisions that the system has made, and using them to constrain the system's future behavior in the dialogue. In this way, we can compositionally represent the strategy that the system is employing. We show that this approach is able to learn a good policy for the task. This work is a first step to a more general exploration of applying RL to negotiation dialogues.