{"title":"Reward Only Training of Encoder-Decoder Digit Recognition Systems Based on Policy Gradient Methods","authors":"Yilong Peng, Hayato Shibata, T. Shinozaki","doi":"10.23919/APSIPA.2018.8659527","DOIUrl":null,"url":null,"abstract":"Zero resource speech recognition is getting attention for engineering as well as scientific purposes. Based on the existing unsupervised learning frameworks using only speech input, however, it is impossible to associate automatically found linguistic units with spellings and concepts. In this paper, we propose an approach that uses a scalar reward that is assumed to be given for each decoding result of an utterance. While the approach is straightforward using reinforcement learning, the difficulty is to obtain a convergence without the help of supervised learning. Focusing on encoder-decoder based speech recognition, we explore several neural network architectures, optimization methods, and reward definitions, seeking a suitable configuration for policy gradient reinforcement learning. Experiments were performed using connected digit utterances from the TIDIGITS corpus as training and evaluation sets. While it is challenging, we show that learning a connected digit recognition system is possible achieving 13.6% of digit error rate. The success largely depends on the configurations and we reveal the appropriate condition that is largely different from supervised training.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/APSIPA.2018.8659527","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Zero resource speech recognition is getting attention for engineering as well as scientific purposes. Based on the existing unsupervised learning frameworks using only speech input, however, it is impossible to associate automatically found linguistic units with spellings and concepts. In this paper, we propose an approach that uses a scalar reward that is assumed to be given for each decoding result of an utterance. While the approach is straightforward using reinforcement learning, the difficulty is to obtain a convergence without the help of supervised learning. Focusing on encoder-decoder based speech recognition, we explore several neural network architectures, optimization methods, and reward definitions, seeking a suitable configuration for policy gradient reinforcement learning. Experiments were performed using connected digit utterances from the TIDIGITS corpus as training and evaluation sets. While it is challenging, we show that learning a connected digit recognition system is possible achieving 13.6% of digit error rate. The success largely depends on the configurations and we reveal the appropriate condition that is largely different from supervised training.