Efficient Recognition and Classification of Stuttered Word from Speech Signal using Deep Learning Technique

2022 IEEE World Conference on Applied Intelligence and Computing (AIC) Pub Date : 2022-06-17 DOI:10.1109/AIC55036.2022.9848868

Kalpana Murugan, Nikhil Kumar Cherukuri, Sai Subhash Donthu

{"title":"Efficient Recognition and Classification of Stuttered Word from Speech Signal using Deep Learning Technique","authors":"Kalpana Murugan, Nikhil Kumar Cherukuri, Sai Subhash Donthu","doi":"10.1109/AIC55036.2022.9848868","DOIUrl":null,"url":null,"abstract":"Fluency is a metric that assesses how well a speaker communicates with another person while presenting the information. Stuttering is one of the fluency problems that have a significant impact on speech recognition. The fluency of a speech is disrupted by involuntary word repetitions and prolongations, as well as external and internal noises. The objective of this study is to improve stuttered speech and create a better speech recognition system that decimates involuntary prolongations of sounds and repetitions of syllables or words. To get a good-quality speech signal, we propose a method in which a stuttered voice signal is analyzed using the classification algorithm called Convolutional Neural Network (CNN). For conversion of data into recognized speech, the approach is to save the input audio (speech signal of a person) with help of a microphone, then eradicate the external noises and stammers, extract features, and finally classify the speech data. The algorithm’s performance is compared using several filters such as Median Filter, Gaussian Filter, Gabor Filter, and Kalman Filter with the measures such as Mean Square Error (MSE), Signal to Noise ratio (SNR), Cross-correlation (CC), Mean Absolute Error (MAE), and Peak Signal to Noise ratio (PSNR). As per the experimental observations, the proposed scheme outperforms the established methods in terms of maintaining the overall speech signal intelligibility of the stuttered speech signal by identifying the stuttered word and removing the repetitions or prolongations. The Kalman filter performs better when compared to other used filters for analysis in terms of pre-processing level.","PeriodicalId":433590,"journal":{"name":"2022 IEEE World Conference on Applied Intelligence and Computing (AIC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE World Conference on Applied Intelligence and Computing (AIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIC55036.2022.9848868","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Fluency is a metric that assesses how well a speaker communicates with another person while presenting the information. Stuttering is one of the fluency problems that have a significant impact on speech recognition. The fluency of a speech is disrupted by involuntary word repetitions and prolongations, as well as external and internal noises. The objective of this study is to improve stuttered speech and create a better speech recognition system that decimates involuntary prolongations of sounds and repetitions of syllables or words. To get a good-quality speech signal, we propose a method in which a stuttered voice signal is analyzed using the classification algorithm called Convolutional Neural Network (CNN). For conversion of data into recognized speech, the approach is to save the input audio (speech signal of a person) with help of a microphone, then eradicate the external noises and stammers, extract features, and finally classify the speech data. The algorithm’s performance is compared using several filters such as Median Filter, Gaussian Filter, Gabor Filter, and Kalman Filter with the measures such as Mean Square Error (MSE), Signal to Noise ratio (SNR), Cross-correlation (CC), Mean Absolute Error (MAE), and Peak Signal to Noise ratio (PSNR). As per the experimental observations, the proposed scheme outperforms the established methods in terms of maintaining the overall speech signal intelligibility of the stuttered speech signal by identifying the stuttered word and removing the repetitions or prolongations. The Kalman filter performs better when compared to other used filters for analysis in terms of pre-processing level.

查看原文本刊更多论文

利用深度学习技术从语音信号中高效识别和分类口吃词

流利度是衡量说话者在表达信息时与他人沟通程度的一种标准。口吃是对语音识别有重大影响的流利性问题之一。不自觉的单词重复和延长，以及外部和内部的噪音，会破坏讲话的流畅性。这项研究的目的是改善口吃，并创造一个更好的语音识别系统，以减少语音的非自愿延长和音节或单词的重复。为了获得高质量的语音信号，我们提出了一种使用卷积神经网络(CNN)分类算法对口吃语音信号进行分析的方法。将输入的音频(人的语音信号)在麦克风的帮助下保存，然后消除外界的噪声和口吃，提取特征，最后对语音数据进行分类，将数据转化为可识别的语音。使用中值滤波器、高斯滤波器、Gabor滤波器和卡尔曼滤波器等几种滤波器与均方误差(MSE)、信噪比(SNR)、相互关系(CC)、平均绝对误差(MAE)和峰值信噪比(PSNR)等指标对算法的性能进行了比较。实验观察表明，该方案通过识别口吃词并去除重复或延长，在保持口吃语音信号的整体可理解性方面优于现有方法。卡尔曼滤波在预处理水平上优于其他常用的分析滤波器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE World Conference on Applied Intelligence and Computing (AIC)

自引率

0.00%

发文量