Efficient Recognition and Classification of Stuttered Word from Speech Signal using Deep Learning Technique

Kalpana Murugan, Nikhil Kumar Cherukuri, Sai Subhash Donthu
{"title":"Efficient Recognition and Classification of Stuttered Word from Speech Signal using Deep Learning Technique","authors":"Kalpana Murugan, Nikhil Kumar Cherukuri, Sai Subhash Donthu","doi":"10.1109/AIC55036.2022.9848868","DOIUrl":null,"url":null,"abstract":"Fluency is a metric that assesses how well a speaker communicates with another person while presenting the information. Stuttering is one of the fluency problems that have a significant impact on speech recognition. The fluency of a speech is disrupted by involuntary word repetitions and prolongations, as well as external and internal noises. The objective of this study is to improve stuttered speech and create a better speech recognition system that decimates involuntary prolongations of sounds and repetitions of syllables or words. To get a good-quality speech signal, we propose a method in which a stuttered voice signal is analyzed using the classification algorithm called Convolutional Neural Network (CNN). For conversion of data into recognized speech, the approach is to save the input audio (speech signal of a person) with help of a microphone, then eradicate the external noises and stammers, extract features, and finally classify the speech data. The algorithm’s performance is compared using several filters such as Median Filter, Gaussian Filter, Gabor Filter, and Kalman Filter with the measures such as Mean Square Error (MSE), Signal to Noise ratio (SNR), Cross-correlation (CC), Mean Absolute Error (MAE), and Peak Signal to Noise ratio (PSNR). As per the experimental observations, the proposed scheme outperforms the established methods in terms of maintaining the overall speech signal intelligibility of the stuttered speech signal by identifying the stuttered word and removing the repetitions or prolongations. The Kalman filter performs better when compared to other used filters for analysis in terms of pre-processing level.","PeriodicalId":433590,"journal":{"name":"2022 IEEE World Conference on Applied Intelligence and Computing (AIC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE World Conference on Applied Intelligence and Computing (AIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIC55036.2022.9848868","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Fluency is a metric that assesses how well a speaker communicates with another person while presenting the information. Stuttering is one of the fluency problems that have a significant impact on speech recognition. The fluency of a speech is disrupted by involuntary word repetitions and prolongations, as well as external and internal noises. The objective of this study is to improve stuttered speech and create a better speech recognition system that decimates involuntary prolongations of sounds and repetitions of syllables or words. To get a good-quality speech signal, we propose a method in which a stuttered voice signal is analyzed using the classification algorithm called Convolutional Neural Network (CNN). For conversion of data into recognized speech, the approach is to save the input audio (speech signal of a person) with help of a microphone, then eradicate the external noises and stammers, extract features, and finally classify the speech data. The algorithm’s performance is compared using several filters such as Median Filter, Gaussian Filter, Gabor Filter, and Kalman Filter with the measures such as Mean Square Error (MSE), Signal to Noise ratio (SNR), Cross-correlation (CC), Mean Absolute Error (MAE), and Peak Signal to Noise ratio (PSNR). As per the experimental observations, the proposed scheme outperforms the established methods in terms of maintaining the overall speech signal intelligibility of the stuttered speech signal by identifying the stuttered word and removing the repetitions or prolongations. The Kalman filter performs better when compared to other used filters for analysis in terms of pre-processing level.
利用深度学习技术从语音信号中高效识别和分类口吃词
流利度是衡量说话者在表达信息时与他人沟通程度的一种标准。口吃是对语音识别有重大影响的流利性问题之一。不自觉的单词重复和延长,以及外部和内部的噪音,会破坏讲话的流畅性。这项研究的目的是改善口吃,并创造一个更好的语音识别系统,以减少语音的非自愿延长和音节或单词的重复。为了获得高质量的语音信号,我们提出了一种使用卷积神经网络(CNN)分类算法对口吃语音信号进行分析的方法。将输入的音频(人的语音信号)在麦克风的帮助下保存,然后消除外界的噪声和口吃,提取特征,最后对语音数据进行分类,将数据转化为可识别的语音。使用中值滤波器、高斯滤波器、Gabor滤波器和卡尔曼滤波器等几种滤波器与均方误差(MSE)、信噪比(SNR)、相互关系(CC)、平均绝对误差(MAE)和峰值信噪比(PSNR)等指标对算法的性能进行了比较。实验观察表明,该方案通过识别口吃词并去除重复或延长,在保持口吃语音信号的整体可理解性方面优于现有方法。卡尔曼滤波在预处理水平上优于其他常用的分析滤波器。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信