基于卷积和循环神经网络的超声波静音语音接口

Q1 Arts and Humanities
E. Juanpere, T. Csapó
{"title":"基于卷积和循环神经网络的超声波静音语音接口","authors":"E. Juanpere, T. Csapó","doi":"10.3813/AAA.919339","DOIUrl":null,"url":null,"abstract":"Silent Speech Interface (SSI) is a technology with the goal of synthesizing speech from articulatory motion. A Deep Neural Network based SSI using ultrasound images of the tongue as input signals and spectral coefficients of a vocoder as target parameters are proposed. Several deep\n learning models, such as a baseline Feed-forward, and a combination of Convolutional and Recurrent Neural Networks are presented and discussed. A pre-processing step using a Deep Convolutional AutoEncoder was also studied. According to the experimental results, an architecture based on a CNN\n and bidirectional LSTM layers has shown the best objective and subjective results.","PeriodicalId":35085,"journal":{"name":"Acta Acustica united with Acustica","volume":"69 7","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Ultrasound-Based Silent Speech Interface Using Convolutional and Recurrent Neural Networks\",\"authors\":\"E. Juanpere, T. Csapó\",\"doi\":\"10.3813/AAA.919339\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Silent Speech Interface (SSI) is a technology with the goal of synthesizing speech from articulatory motion. A Deep Neural Network based SSI using ultrasound images of the tongue as input signals and spectral coefficients of a vocoder as target parameters are proposed. Several deep\\n learning models, such as a baseline Feed-forward, and a combination of Convolutional and Recurrent Neural Networks are presented and discussed. A pre-processing step using a Deep Convolutional AutoEncoder was also studied. According to the experimental results, an architecture based on a CNN\\n and bidirectional LSTM layers has shown the best objective and subjective results.\",\"PeriodicalId\":35085,\"journal\":{\"name\":\"Acta Acustica united with Acustica\",\"volume\":\"69 7\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Acta Acustica united with Acustica\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3813/AAA.919339\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Arts and Humanities\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Acustica united with Acustica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3813/AAA.919339","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Arts and Humanities","Score":null,"Total":0}
引用次数: 17

摘要

无声语音接口(Silent Speech Interface, SSI)是一种以发音动作合成语音为目标的技术。提出了一种以舌头超声图像为输入信号,以声码器的频谱系数为目标参数的基于深度神经网络的SSI方法。提出并讨论了几种深度学习模型,如基线前馈,卷积和循环神经网络的组合。研究了使用深度卷积自编码器进行预处理的步骤。实验结果表明,基于CNN和双向LSTM层的体系结构表现出最佳的客观和主观效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Ultrasound-Based Silent Speech Interface Using Convolutional and Recurrent Neural Networks
Silent Speech Interface (SSI) is a technology with the goal of synthesizing speech from articulatory motion. A Deep Neural Network based SSI using ultrasound images of the tongue as input signals and spectral coefficients of a vocoder as target parameters are proposed. Several deep learning models, such as a baseline Feed-forward, and a combination of Convolutional and Recurrent Neural Networks are presented and discussed. A pre-processing step using a Deep Convolutional AutoEncoder was also studied. According to the experimental results, an architecture based on a CNN and bidirectional LSTM layers has shown the best objective and subjective results.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.60
自引率
0.00%
发文量
0
审稿时长
6.8 months
期刊介绍: Cessation. Acta Acustica united with Acustica (Acta Acust united Ac), was published together with the European Acoustics Association (EAA). It was an international, peer-reviewed journal on acoustics. It published original articles on all subjects in the field of acoustics, such as • General Linear Acoustics, • Nonlinear Acoustics, Macrosonics, • Aeroacoustics, • Atmospheric Sound, • Underwater Sound, • Ultrasonics, • Physical Acoustics, • Structural Acoustics, • Noise Control, • Active Control, • Environmental Noise, • Building Acoustics, • Room Acoustics, • Acoustic Materials and Metamaterials, • Audio Signal Processing and Transducers, • Computational and Numerical Acoustics, • Hearing, Audiology and Psychoacoustics, • Speech, • Musical Acoustics, • Virtual Acoustics, • Auditory Quality of Systems, • Animal Bioacoustics, • History of Acoustics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信