Kalpana Murugan, Nikhil Kumar Cherukuri, Sai Subhash Donthu
{"title":"Efficient Recognition and Classification of Stuttered Word from Speech Signal using Deep Learning Technique","authors":"Kalpana Murugan, Nikhil Kumar Cherukuri, Sai Subhash Donthu","doi":"10.1109/AIC55036.2022.9848868","DOIUrl":null,"url":null,"abstract":"Fluency is a metric that assesses how well a speaker communicates with another person while presenting the information. Stuttering is one of the fluency problems that have a significant impact on speech recognition. The fluency of a speech is disrupted by involuntary word repetitions and prolongations, as well as external and internal noises. The objective of this study is to improve stuttered speech and create a better speech recognition system that decimates involuntary prolongations of sounds and repetitions of syllables or words. To get a good-quality speech signal, we propose a method in which a stuttered voice signal is analyzed using the classification algorithm called Convolutional Neural Network (CNN). For conversion of data into recognized speech, the approach is to save the input audio (speech signal of a person) with help of a microphone, then eradicate the external noises and stammers, extract features, and finally classify the speech data. The algorithm’s performance is compared using several filters such as Median Filter, Gaussian Filter, Gabor Filter, and Kalman Filter with the measures such as Mean Square Error (MSE), Signal to Noise ratio (SNR), Cross-correlation (CC), Mean Absolute Error (MAE), and Peak Signal to Noise ratio (PSNR). As per the experimental observations, the proposed scheme outperforms the established methods in terms of maintaining the overall speech signal intelligibility of the stuttered speech signal by identifying the stuttered word and removing the repetitions or prolongations. The Kalman filter performs better when compared to other used filters for analysis in terms of pre-processing level.","PeriodicalId":433590,"journal":{"name":"2022 IEEE World Conference on Applied Intelligence and Computing (AIC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE World Conference on Applied Intelligence and Computing (AIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIC55036.2022.9848868","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Fluency is a metric that assesses how well a speaker communicates with another person while presenting the information. Stuttering is one of the fluency problems that have a significant impact on speech recognition. The fluency of a speech is disrupted by involuntary word repetitions and prolongations, as well as external and internal noises. The objective of this study is to improve stuttered speech and create a better speech recognition system that decimates involuntary prolongations of sounds and repetitions of syllables or words. To get a good-quality speech signal, we propose a method in which a stuttered voice signal is analyzed using the classification algorithm called Convolutional Neural Network (CNN). For conversion of data into recognized speech, the approach is to save the input audio (speech signal of a person) with help of a microphone, then eradicate the external noises and stammers, extract features, and finally classify the speech data. The algorithm’s performance is compared using several filters such as Median Filter, Gaussian Filter, Gabor Filter, and Kalman Filter with the measures such as Mean Square Error (MSE), Signal to Noise ratio (SNR), Cross-correlation (CC), Mean Absolute Error (MAE), and Peak Signal to Noise ratio (PSNR). As per the experimental observations, the proposed scheme outperforms the established methods in terms of maintaining the overall speech signal intelligibility of the stuttered speech signal by identifying the stuttered word and removing the repetitions or prolongations. The Kalman filter performs better when compared to other used filters for analysis in terms of pre-processing level.