Hybrid Approach to Detect Prolonged Speech Segments

International Journal of Engineering and Advanced Technology Pub Date : 2023-04-30 DOI:10.35940/ijeat.d4106.0412423

{"title":"Hybrid Approach to Detect Prolonged Speech Segments","authors":"","doi":"10.35940/ijeat.d4106.0412423","DOIUrl":null,"url":null,"abstract":"In the last 10 decades various methods have been introduced to detect prolonged speech segments automatically for stuttered speech signals. However less attention has been paid by researches in the detection of prolongation disorder at the parametric level. The aim of this study is to propose a hybrid approach to detect the prolonged speech segments by combining various spectral parameters with their recognition accuracies for the reconstructed speech signal. The paper presents prolonged segments detection by considering the parameters individually, combining various spectral parameters, validation of prolongation detection system, MFCC feature extraction process, basic model accuracies for the reconstructed signals. The proposed methods are simulated and experimented on UCLASS derived dataset. Obtained results are compared with the existing works of prolongation detection at parametric and word level. It is observed that hybrid parameters yield 92% of recognition rate for larger frame sizes of 200ms when modeled with SVM. The results are also tabulated and discussed for various metrics like sensitivity, specificity and accuracy metrics in detecting the prolonged segments. The study also focuses on the prolongation characteristics of vocalized and non-vocalized sounds at phoneme level. The detection accuracy of 71% is observed for Vocalized prolonged vowel phonemes over non-vocalized prolonged signal. Objectives: The objective of this work is to propose a hybrid algorithm to detect prolonged segments automatically for speech signal with prolongation disorder. The other objective is to evaluate the obtained spectral parameters performances by applying to various evaluation metrics and models to compute the recognition accuracy of a reconstructed signal. The objective is further extended to bring out the importance of variable frame size concept and to analyze the variations in vocalized and non-vocalized sounds. Methods: The methods adopted to detect prolonged speech segments are discussed at two levels namely at the preprocessing and modeling levels. The Preprocessing level is discussed by applying various parameters at an individual level, hybrid level by combing the Centroid, Entropy, Energy, ZCR parameters and MFCC feature extraction method. A new method has been applied using Specificity, Sensitivity and accuracy metrics to validate the prolongation detection model performance. In modeling level, the above parameters are discussed by applying evaluation metrics for the clustering and classification models like K-means, FCM and SVM. The performance of these methods is considered for evaluating and estimating the prolonged segment detection accuracy of the reconstructed speech signals of vocalized and non-vocalized sounds. All these methods are discussed in detail in the following sections. Findings: Hybridizing the spectral parameters to detect the prolonged speech segment automatically is a major finding of this work. It is also found that Specificity, sensitivity and accuracy metrics plays a major role in designing and validating the prolongation detection model. From the further experiments it is identified that the hybrid and verification metrics suits better for vocalized and non-vocalized sounds when larger frame lengths are considered. SVM has been found to perform better for all the above considerations. Novelty: As per Literature survey it is observed that individual and few parameters are applied to detect the prolongation. But works are not addressed on applying or combining more than two parameters to detect the prolonged speech segments. The novelty of this work lies in selecting and combining the spectral parameters at the preprocessing stage to detect the prolongation disorder. Spectral centroid and entropy are considered as appropriate parameters along with ZCR and Energy parameters. Hence hybridizing these parameters results in a novelty to propose an automatic prolongation detection system. Novelty is further brought by applying Specificity, sensitivity and accuracy metrics to build and evaluate the detection system for vocalized and non-vocalized prolonged sounds.","PeriodicalId":13981,"journal":{"name":"International Journal of Engineering and Advanced Technology","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Engineering and Advanced Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35940/ijeat.d4106.0412423","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

In the last 10 decades various methods have been introduced to detect prolonged speech segments automatically for stuttered speech signals. However less attention has been paid by researches in the detection of prolongation disorder at the parametric level. The aim of this study is to propose a hybrid approach to detect the prolonged speech segments by combining various spectral parameters with their recognition accuracies for the reconstructed speech signal. The paper presents prolonged segments detection by considering the parameters individually, combining various spectral parameters, validation of prolongation detection system, MFCC feature extraction process, basic model accuracies for the reconstructed signals. The proposed methods are simulated and experimented on UCLASS derived dataset. Obtained results are compared with the existing works of prolongation detection at parametric and word level. It is observed that hybrid parameters yield 92% of recognition rate for larger frame sizes of 200ms when modeled with SVM. The results are also tabulated and discussed for various metrics like sensitivity, specificity and accuracy metrics in detecting the prolonged segments. The study also focuses on the prolongation characteristics of vocalized and non-vocalized sounds at phoneme level. The detection accuracy of 71% is observed for Vocalized prolonged vowel phonemes over non-vocalized prolonged signal. Objectives: The objective of this work is to propose a hybrid algorithm to detect prolonged segments automatically for speech signal with prolongation disorder. The other objective is to evaluate the obtained spectral parameters performances by applying to various evaluation metrics and models to compute the recognition accuracy of a reconstructed signal. The objective is further extended to bring out the importance of variable frame size concept and to analyze the variations in vocalized and non-vocalized sounds. Methods: The methods adopted to detect prolonged speech segments are discussed at two levels namely at the preprocessing and modeling levels. The Preprocessing level is discussed by applying various parameters at an individual level, hybrid level by combing the Centroid, Entropy, Energy, ZCR parameters and MFCC feature extraction method. A new method has been applied using Specificity, Sensitivity and accuracy metrics to validate the prolongation detection model performance. In modeling level, the above parameters are discussed by applying evaluation metrics for the clustering and classification models like K-means, FCM and SVM. The performance of these methods is considered for evaluating and estimating the prolonged segment detection accuracy of the reconstructed speech signals of vocalized and non-vocalized sounds. All these methods are discussed in detail in the following sections. Findings: Hybridizing the spectral parameters to detect the prolonged speech segment automatically is a major finding of this work. It is also found that Specificity, sensitivity and accuracy metrics plays a major role in designing and validating the prolongation detection model. From the further experiments it is identified that the hybrid and verification metrics suits better for vocalized and non-vocalized sounds when larger frame lengths are considered. SVM has been found to perform better for all the above considerations. Novelty: As per Literature survey it is observed that individual and few parameters are applied to detect the prolongation. But works are not addressed on applying or combining more than two parameters to detect the prolonged speech segments. The novelty of this work lies in selecting and combining the spectral parameters at the preprocessing stage to detect the prolongation disorder. Spectral centroid and entropy are considered as appropriate parameters along with ZCR and Energy parameters. Hence hybridizing these parameters results in a novelty to propose an automatic prolongation detection system. Novelty is further brought by applying Specificity, sensitivity and accuracy metrics to build and evaluate the detection system for vocalized and non-vocalized prolonged sounds.

查看原文本刊更多论文

语音长段检测的混合方法

在过去的十年里，人们提出了各种方法来自动检测口吃语音信号的延长语音片段。然而，在参数水平上对延长障碍的检测研究却很少受到重视。本研究的目的是结合各种频谱参数及其对重构语音信号的识别精度，提出一种混合检测延长语音片段的方法。本文介绍了分别考虑各参数、结合各频谱参数的延长段检测、延长段检测系统的验证、MFCC特征提取过程、重构信号的基本模型精度。在UCLASS衍生数据集上对所提出的方法进行了仿真和实验。将所得结果与已有的参数级和词级延长检测工作进行了比较。观察到，当使用SVM建模时，混合参数对200ms大帧的识别率达到92%。结果也被制成表格，并讨论了各种指标，如灵敏度，特异性和准确性指标在检测延长段。在音素水平上研究了发声和非发声语音的延长特征。对发声延长元音音素的检测准确率为71%，高于非发声延长音素。目的:本研究的目的是提出一种混合算法来自动检测具有延长障碍的语音信号的延长段。另一个目标是通过应用各种评估指标和模型来评估获得的频谱参数性能，以计算重构信号的识别精度。目的进一步扩展到提出可变帧大小概念的重要性，并分析发声和非发声声音的变化。方法:从预处理和建模两个层面讨论了延长语音片段检测的方法。通过在个体层面应用各种参数，结合质心、熵、能量、ZCR参数和MFCC特征提取方法，讨论了预处理层面。采用特异性、灵敏度和准确性指标对延长检测模型的性能进行了验证。在建模层面，通过K-means、FCM、SVM等聚类和分类模型的评价指标对上述参数进行讨论。考虑了这些方法的性能，评估和估计重构语音信号的发声和非发声的延长段检测精度。下面几节将详细讨论所有这些方法。研究结果:对谱参数进行杂交以自动检测延长的语音片段是本工作的主要发现。我们还发现特异性、敏感性和准确性指标在延长检测模型的设计和验证中起着重要作用。从进一步的实验中发现，当考虑较大的帧长度时，混合和验证度量更适合于发声和非发声声音。我们发现SVM在考虑以上所有因素时表现更好。新颖性:根据文献调查，我们观察到使用个体和少数参数来检测延长。但是，如何应用或组合两个以上的参数来检测延长的语音片段，目前还没有研究。本工作的新颖之处在于在预处理阶段对光谱参数进行选择和组合，以检测延长紊乱。谱质心和熵与ZCR和能量参数是合适的参数。因此，将这些参数混合在一起，提出了一种新颖的自动延长检测系统。应用特异性、灵敏度和准确性指标构建和评价发声和非发声延长音检测系统，带来新颖性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Engineering and Advanced Technology

自引率

0.00%

发文量