Qingxin Xiao , Yangyang Zhao , Lingwei Dang , Yun Hao , Le Che , Qingyao Wu
{"title":"超越游戏环境:面向任务的对话策略探索的带有参数空间噪声的进化算法","authors":"Qingxin Xiao , Yangyang Zhao , Lingwei Dang , Yun Hao , Le Che , Qingyao Wu","doi":"10.1016/j.neucom.2025.130639","DOIUrl":null,"url":null,"abstract":"<div><div>Reinforcement learning (RL) has achieved significant success in task-oriented dialogue (TOD) policy learning. Nevertheless, training dialogue policy through RL faces a critical challenge: insufficient exploration, which leads to the policy getting trapped in local optima. Evolutionary algorithms (EAs) enhance exploration breadth by maintaining and selecting diverse individuals, and they often add parameter space noise among different individuals to simulate mutation, thereby increasing exploration depth. This approach has proven to be an effective method for enhancing RL exploration and has shown promising results in game domains. However, previous research has not analyzed its effectiveness in TOD dialogue policy. Given the substantial differences between gaming contexts and TOD dialogue policy, this paper explores and validates the efficacy of EAs in TOD dialogue policy, investigating the effects of different evolutionary cycles and various noise strategies across different dialogue tasks to determine which combination of evolutionary cycle and noise strategy is most suitable for TOD dialogue policy. Additionally, we propose an adaptive noise evolution method that dynamically adjusts noise scales to improve exploration efficiency. Experiments on the MultiWOZ dataset demonstrate significant performance improvements, achieving state-of-the-art results in both on-policy and off-policy settings.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"648 ","pages":"Article 130639"},"PeriodicalIF":5.5000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Beyond game environments: Evolutionary algorithms with parameter space noise for task-oriented dialogue policy exploration\",\"authors\":\"Qingxin Xiao , Yangyang Zhao , Lingwei Dang , Yun Hao , Le Che , Qingyao Wu\",\"doi\":\"10.1016/j.neucom.2025.130639\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Reinforcement learning (RL) has achieved significant success in task-oriented dialogue (TOD) policy learning. Nevertheless, training dialogue policy through RL faces a critical challenge: insufficient exploration, which leads to the policy getting trapped in local optima. Evolutionary algorithms (EAs) enhance exploration breadth by maintaining and selecting diverse individuals, and they often add parameter space noise among different individuals to simulate mutation, thereby increasing exploration depth. This approach has proven to be an effective method for enhancing RL exploration and has shown promising results in game domains. However, previous research has not analyzed its effectiveness in TOD dialogue policy. Given the substantial differences between gaming contexts and TOD dialogue policy, this paper explores and validates the efficacy of EAs in TOD dialogue policy, investigating the effects of different evolutionary cycles and various noise strategies across different dialogue tasks to determine which combination of evolutionary cycle and noise strategy is most suitable for TOD dialogue policy. Additionally, we propose an adaptive noise evolution method that dynamically adjusts noise scales to improve exploration efficiency. Experiments on the MultiWOZ dataset demonstrate significant performance improvements, achieving state-of-the-art results in both on-policy and off-policy settings.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"648 \",\"pages\":\"Article 130639\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225013116\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225013116","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Beyond game environments: Evolutionary algorithms with parameter space noise for task-oriented dialogue policy exploration
Reinforcement learning (RL) has achieved significant success in task-oriented dialogue (TOD) policy learning. Nevertheless, training dialogue policy through RL faces a critical challenge: insufficient exploration, which leads to the policy getting trapped in local optima. Evolutionary algorithms (EAs) enhance exploration breadth by maintaining and selecting diverse individuals, and they often add parameter space noise among different individuals to simulate mutation, thereby increasing exploration depth. This approach has proven to be an effective method for enhancing RL exploration and has shown promising results in game domains. However, previous research has not analyzed its effectiveness in TOD dialogue policy. Given the substantial differences between gaming contexts and TOD dialogue policy, this paper explores and validates the efficacy of EAs in TOD dialogue policy, investigating the effects of different evolutionary cycles and various noise strategies across different dialogue tasks to determine which combination of evolutionary cycle and noise strategy is most suitable for TOD dialogue policy. Additionally, we propose an adaptive noise evolution method that dynamically adjusts noise scales to improve exploration efficiency. Experiments on the MultiWOZ dataset demonstrate significant performance improvements, achieving state-of-the-art results in both on-policy and off-policy settings.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.