{"title":"Low-Resource Neural Machine Translation with Neural Episodic Control","authors":"Nier Wu, H. Hou, Shuo Sun, Wei Zheng","doi":"10.1109/IJCNN52387.2021.9533677","DOIUrl":null,"url":null,"abstract":"Reinforcement Learning (RL) has been proved to alleviate metric inconsistency and exposure deviation in training-evaluation of neural machine translation (NMT), but the sample efficiency is limited by sampling methods (Temporal-Difference (TD) or Monte-Carlo (MC)), and still cannot compensate for the inefficient non-zero rewards caused by insufficient data sets. In addition, RL rewards can only be effective when the model parameters are basically determined. Therefore, we proposed episodic control reinforcement learning method, which obtains the model with basically determined parameters through the knowledge transfer, and records the historical action trajectory by introducing semi-tabular differentiable neural dictionary (DND), the model can quickly approximate the real state-value according to samples reward when updating policy. We verified on CCMT2019 Mongolian-Chinese (Mo-Zh), Tibetan-Chinese (Ti-Zh), and Uyghur-Chinese (Ug-Zh) tasks, and the results showed that the quality was significantly improved, which fully demonstrated the effectiveness of the method.","PeriodicalId":396583,"journal":{"name":"2021 International Joint Conference on Neural Networks (IJCNN)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN52387.2021.9533677","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Reinforcement Learning (RL) has been proved to alleviate metric inconsistency and exposure deviation in training-evaluation of neural machine translation (NMT), but the sample efficiency is limited by sampling methods (Temporal-Difference (TD) or Monte-Carlo (MC)), and still cannot compensate for the inefficient non-zero rewards caused by insufficient data sets. In addition, RL rewards can only be effective when the model parameters are basically determined. Therefore, we proposed episodic control reinforcement learning method, which obtains the model with basically determined parameters through the knowledge transfer, and records the historical action trajectory by introducing semi-tabular differentiable neural dictionary (DND), the model can quickly approximate the real state-value according to samples reward when updating policy. We verified on CCMT2019 Mongolian-Chinese (Mo-Zh), Tibetan-Chinese (Ti-Zh), and Uyghur-Chinese (Ug-Zh) tasks, and the results showed that the quality was significantly improved, which fully demonstrated the effectiveness of the method.