基于卷积和循环神经网络的超声波静音语音接口

Q1 Arts and Humanities

Acta Acustica united with Acustica Pub Date : 2019-07-01 DOI:10.3813/AAA.919339

E. Juanpere, T. Csapó

{"title":"基于卷积和循环神经网络的超声波静音语音接口","authors":"E. Juanpere, T. Csapó","doi":"10.3813/AAA.919339","DOIUrl":null,"url":null,"abstract":"Silent Speech Interface (SSI) is a technology with the goal of synthesizing speech from articulatory motion. A Deep Neural Network based SSI using ultrasound images of the tongue as input signals and spectral coefficients of a vocoder as target parameters are proposed. Several deep\n learning models, such as a baseline Feed-forward, and a combination of Convolutional and Recurrent Neural Networks are presented and discussed. A pre-processing step using a Deep Convolutional AutoEncoder was also studied. According to the experimental results, an architecture based on a CNN\n and bidirectional LSTM layers has shown the best objective and subjective results.","PeriodicalId":35085,"journal":{"name":"Acta Acustica united with Acustica","volume":"69 7","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Ultrasound-Based Silent Speech Interface Using Convolutional and Recurrent Neural Networks\",\"authors\":\"E. Juanpere, T. Csapó\",\"doi\":\"10.3813/AAA.919339\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Silent Speech Interface (SSI) is a technology with the goal of synthesizing speech from articulatory motion. A Deep Neural Network based SSI using ultrasound images of the tongue as input signals and spectral coefficients of a vocoder as target parameters are proposed. Several deep\\n learning models, such as a baseline Feed-forward, and a combination of Convolutional and Recurrent Neural Networks are presented and discussed. A pre-processing step using a Deep Convolutional AutoEncoder was also studied. According to the experimental results, an architecture based on a CNN\\n and bidirectional LSTM layers has shown the best objective and subjective results.\",\"PeriodicalId\":35085,\"journal\":{\"name\":\"Acta Acustica united with Acustica\",\"volume\":\"69 7\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Acta Acustica united with Acustica\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3813/AAA.919339\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Arts and Humanities\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Acustica united with Acustica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3813/AAA.919339","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Arts and Humanities","Score":null,"Total":0}

引用次数: 17

摘要

无声语音接口(Silent Speech Interface, SSI)是一种以发音动作合成语音为目标的技术。提出了一种以舌头超声图像为输入信号，以声码器的频谱系数为目标参数的基于深度神经网络的SSI方法。提出并讨论了几种深度学习模型，如基线前馈，卷积和循环神经网络的组合。研究了使用深度卷积自编码器进行预处理的步骤。实验结果表明，基于CNN和双向LSTM层的体系结构表现出最佳的客观和主观效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Ultrasound-Based Silent Speech Interface Using Convolutional and Recurrent Neural Networks

Silent Speech Interface (SSI) is a technology with the goal of synthesizing speech from articulatory motion. A Deep Neural Network based SSI using ultrasound images of the tongue as input signals and spectral coefficients of a vocoder as target parameters are proposed. Several deep learning models, such as a baseline Feed-forward, and a combination of Convolutional and Recurrent Neural Networks are presented and discussed. A pre-processing step using a Deep Convolutional AutoEncoder was also studied. According to the experimental results, an architecture based on a CNN and bidirectional LSTM layers has shown the best objective and subjective results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Acta Acustica united with Acustica 物理-声学

CiteScore

2.60

自引率

0.00%

发文量

审稿时长

6.8 months

期刊介绍： Cessation. Acta Acustica united with Acustica (Acta Acust united Ac), was published together with the European Acoustics Association (EAA). It was an international, peer-reviewed journal on acoustics. It published original articles on all subjects in the field of acoustics, such as • General Linear Acoustics, • Nonlinear Acoustics, Macrosonics, • Aeroacoustics, • Atmospheric Sound, • Underwater Sound, • Ultrasonics, • Physical Acoustics, • Structural Acoustics, • Noise Control, • Active Control, • Environmental Noise, • Building Acoustics, • Room Acoustics, • Acoustic Materials and Metamaterials, • Audio Signal Processing and Transducers, • Computational and Numerical Acoustics, • Hearing, Audiology and Psychoacoustics, • Speech, • Musical Acoustics, • Virtual Acoustics, • Auditory Quality of Systems, • Animal Bioacoustics, • History of Acoustics.