Classification of functional dysphonia using the tunable Q wavelet transform

IF 2.4 3区计算机科学 Q2 ACOUSTICS

Speech Communication Pub Date : 2023-10-06 DOI:10.1016/j.specom.2023.102989

Kiran Reddy Mittapalle , Madhu Keerthana Yagnavajjula , Paavo Alku

{"title":"Classification of functional dysphonia using the tunable Q wavelet transform","authors":"Kiran Reddy Mittapalle , Madhu Keerthana Yagnavajjula , Paavo Alku","doi":"10.1016/j.specom.2023.102989","DOIUrl":null,"url":null,"abstract":"<div><p>Functional dysphonia (FD) refers to an abnormality in voice quality in the absence of an identifiable lesion. In this paper, we propose an approach based on the tunable Q wavelet transform (TQWT) to automatically classify two types of FD (hyperfunctional dysphonia and hypofunctional dysphonia) from a healthy voice using the acoustic voice signal. Using TQWT, voice signals were decomposed into sub-bands and the entropy values extracted from the sub-bands were utilized as features for the studied 3-class classification problem. In addition, the Mel-frequency cepstral coefficient (MFCC) and glottal features were extracted from the acoustic voice signal and the estimated glottal source signal, respectively. A convolutional neural network (CNN) classifier was trained separately for the TQWT, MFCC and glottal features. Experiments were conducted using voice signals of 57 healthy speakers and 113 FD patients (72 with hyperfunctional dysphonia and 41 with hypofunctional dysphonia) taken from the VOICED database. These experiments revealed that the TQWT features yielded an absolute improvement of 5.5% and 4.5% compared to the baseline MFCC features and glottal features, respectively. Furthermore, the highest classification accuracy (67.91%) was obtained using the combination of the TQWT and glottal features, which indicates the complementary nature of these features.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"155 ","pages":"Article 102989"},"PeriodicalIF":2.4000,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167639323001231","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Functional dysphonia (FD) refers to an abnormality in voice quality in the absence of an identifiable lesion. In this paper, we propose an approach based on the tunable Q wavelet transform (TQWT) to automatically classify two types of FD (hyperfunctional dysphonia and hypofunctional dysphonia) from a healthy voice using the acoustic voice signal. Using TQWT, voice signals were decomposed into sub-bands and the entropy values extracted from the sub-bands were utilized as features for the studied 3-class classification problem. In addition, the Mel-frequency cepstral coefficient (MFCC) and glottal features were extracted from the acoustic voice signal and the estimated glottal source signal, respectively. A convolutional neural network (CNN) classifier was trained separately for the TQWT, MFCC and glottal features. Experiments were conducted using voice signals of 57 healthy speakers and 113 FD patients (72 with hyperfunctional dysphonia and 41 with hypofunctional dysphonia) taken from the VOICED database. These experiments revealed that the TQWT features yielded an absolute improvement of 5.5% and 4.5% compared to the baseline MFCC features and glottal features, respectively. Furthermore, the highest classification accuracy (67.91%) was obtained using the combination of the TQWT and glottal features, which indicates the complementary nature of these features.

查看原文本刊更多论文

基于可调Q小波变换的功能性语音障碍分类

功能性发音困难（FD）是指在没有可识别病变的情况下出现的语音质量异常。在本文中，我们提出了一种基于可调Q小波变换（TQWT）的方法，使用声学语音信号从健康语音中自动分类两种类型的FD（高功能性发音困难和低功能性发音障碍）。使用TQWT，将语音信号分解为子波段，并将从子波段提取的熵值用作所研究的3类分类问题的特征。此外，分别从声学语音信号和估计的声门源信号中提取Mel频率倒谱系数（MFCC）和声门特征。卷积神经网络（CNN）分类器分别针对TQWT、MFCC和声门特征进行训练。使用来自VOICED数据库的57名健康说话者和113名FD患者（72名患有高功能性发音困难，41名患有低功能性发音障碍）的语音信号进行实验。这些实验表明，与基线MFCC特征和声门特征相比，TQWT特征分别产生了5.5%和4.5%的绝对改善。此外，使用TQWT和声门特征的组合获得了最高的分类准确率（67.91%），这表明了这些特征的互补性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Speech Communication 工程技术-计算机：跨学科应用

CiteScore

6.80

自引率

6.20%

发文量

审稿时长

19.2 weeks

期刊介绍： Speech Communication is an interdisciplinary journal whose primary objective is to fulfil the need for the rapid dissemination and thorough discussion of basic and applied research results. The journal''s primary objectives are: • to present a forum for the advancement of human and human-machine speech communication science; • to stimulate cross-fertilization between different fields of this domain; • to contribute towards the rapid and wide diffusion of scientifically sound contributions in this domain.