Non-Audible Speech Classification Using Deep Learning Approaches

2019 International Conference on Computational Science and Computational Intelligence (CSCI) Pub Date : 2019-12-01 DOI:10.1109/CSCI49370.2019.00118

Rommel Fernandes, Lei Huang, G. Vejarano

{"title":"Non-Audible Speech Classification Using Deep Learning Approaches","authors":"Rommel Fernandes, Lei Huang, G. Vejarano","doi":"10.1109/CSCI49370.2019.00118","DOIUrl":null,"url":null,"abstract":"Research advancement of human-computer interaction (HCI) has recently been made to help post-stroke victims dealing with physiological problems such as speech impediments due to aphasia. This paper investigates different deep learning approaches used for non-audible speech recognition using electromyography (EMG) signals with a novel approach employing continuous wavelet transforms (CWT) and convolutional neural networks (CNNs). To compare its performance with other popular deep learning approaches, we collected facial surface EMG bio-signals from subjects with binary and multi-class labels, trained and tested four models, including a long-short term memory(LSTM) model, a bi-directional LSTM model, a 1-D CNN model, and our proposed CWT-CNN model. Experimental results show that our proposed approach performs better than the LSTM models, but is less efficient than the 1-D CNN model on our collected data set. In comparison with previous research, we gained insights on how to improve the performance of the model for binary and multi-class silent speech recognition.","PeriodicalId":103662,"journal":{"name":"2019 International Conference on Computational Science and Computational Intelligence (CSCI)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Computational Science and Computational Intelligence (CSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSCI49370.2019.00118","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Research advancement of human-computer interaction (HCI) has recently been made to help post-stroke victims dealing with physiological problems such as speech impediments due to aphasia. This paper investigates different deep learning approaches used for non-audible speech recognition using electromyography (EMG) signals with a novel approach employing continuous wavelet transforms (CWT) and convolutional neural networks (CNNs). To compare its performance with other popular deep learning approaches, we collected facial surface EMG bio-signals from subjects with binary and multi-class labels, trained and tested four models, including a long-short term memory(LSTM) model, a bi-directional LSTM model, a 1-D CNN model, and our proposed CWT-CNN model. Experimental results show that our proposed approach performs better than the LSTM models, but is less efficient than the 1-D CNN model on our collected data set. In comparison with previous research, we gained insights on how to improve the performance of the model for binary and multi-class silent speech recognition.

查看原文本刊更多论文

使用深度学习方法的不可听语音分类

近年来，人机交互(HCI)的研究取得了进展，以帮助中风后患者处理由失语症引起的语言障碍等生理问题。本文研究了利用肌电图(EMG)信号进行非听语音识别的不同深度学习方法，并采用了一种采用连续小波变换(CWT)和卷积神经网络(cnn)的新方法。为了与其他流行的深度学习方法进行比较，我们收集了具有二分类和多分类标签的受试者的面部肌电信号，训练和测试了四种模型，包括长短期记忆(LSTM)模型、双向LSTM模型、一维CNN模型和我们提出的CWT-CNN模型。实验结果表明，在我们收集的数据集上，我们提出的方法比LSTM模型性能更好，但比一维CNN模型效率低。与以往的研究相比，我们对如何提高模型在二值和多类无声语音识别中的性能有了新的认识。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 International Conference on Computational Science and Computational Intelligence (CSCI)

自引率

0.00%

发文量