Unsupervised Acoustic-to-Articulatory Inversion Neural Network Learning Based on Deterministic Policy Gradient

Hayato Shibata, Mingxin Zhang, T. Shinozaki
{"title":"Unsupervised Acoustic-to-Articulatory Inversion Neural Network Learning Based on Deterministic Policy Gradient","authors":"Hayato Shibata, Mingxin Zhang, T. Shinozaki","doi":"10.1109/SLT48900.2021.9383554","DOIUrl":null,"url":null,"abstract":"This paper presents an unsupervised learning method of deep neural networks that perform acoustic-to-articulatory inversion for arbitrary utterances. Conventional unsupervised acoustic-to-articulatory inversion methods are based on the analysis-by-synthesis approach and non-linear optimization algorithms. One limitation is that they require time-consuming iterative optimizations to obtain articulatory parameters for a given target speech segment. Neural networks, after learning their relationship, can obtain these articulatory parameters without an iterative optimization. However, conventional methods need supervised learning and paired acoustic and articulatory samples. We propose a hybrid auto-encoder based unsupervised learning framework for the acoustic-to-articulatory inversion neural networks that can capture context information. The essential point of the framework is making the training effective. We investigate several reinforcement learning algorithms and show the usefulness of the deterministic policy gradient. Experimental results demonstrate that the proposed method can infer articulatory parameters not only for training set segments but also for unseen utterances. Averaged reconstruction errors achieved for open test samples are similar to or even lower than the conventional method that directly optimizes the articulatory parameters in a closed condition.","PeriodicalId":243211,"journal":{"name":"2021 IEEE Spoken Language Technology Workshop (SLT)","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT48900.2021.9383554","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

This paper presents an unsupervised learning method of deep neural networks that perform acoustic-to-articulatory inversion for arbitrary utterances. Conventional unsupervised acoustic-to-articulatory inversion methods are based on the analysis-by-synthesis approach and non-linear optimization algorithms. One limitation is that they require time-consuming iterative optimizations to obtain articulatory parameters for a given target speech segment. Neural networks, after learning their relationship, can obtain these articulatory parameters without an iterative optimization. However, conventional methods need supervised learning and paired acoustic and articulatory samples. We propose a hybrid auto-encoder based unsupervised learning framework for the acoustic-to-articulatory inversion neural networks that can capture context information. The essential point of the framework is making the training effective. We investigate several reinforcement learning algorithms and show the usefulness of the deterministic policy gradient. Experimental results demonstrate that the proposed method can infer articulatory parameters not only for training set segments but also for unseen utterances. Averaged reconstruction errors achieved for open test samples are similar to or even lower than the conventional method that directly optimizes the articulatory parameters in a closed condition.
基于确定性策略梯度的无监督声学-发音逆神经网络学习
本文提出了一种深度神经网络的无监督学习方法,该方法对任意话语进行声学-发音反转。传统的无监督声学-发音反演方法是基于综合分析方法和非线性优化算法。一个限制是,他们需要耗时的迭代优化,以获得一个给定的目标语音段的发音参数。神经网络在学习了它们之间的关系后,无需迭代优化即可获得这些发音参数。然而,传统的方法需要监督学习和配对的声学和发音样本。我们提出了一种基于混合自编码器的无监督学习框架,用于声学-发音反转神经网络,可以捕获上下文信息。该框架的要点是使培训有效。我们研究了几种强化学习算法,并展示了确定性策略梯度的有用性。实验结果表明,该方法不仅可以推断训练集片段的发音参数,还可以推断未见过的话语。开放测试样品的平均重构误差与在封闭条件下直接优化发音参数的传统方法相似甚至更低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信