Optimized early fusion of handcrafted and deep learning descriptors for voice pathology detection and classification

Healthcare analytics (New York, N.Y.) Pub Date : 2024-12-01 DOI:10.1016/j.health.2024.100369

Roohum Jegan, R. Jayagowri

{"title":"Optimized early fusion of handcrafted and deep learning descriptors for voice pathology detection and classification","authors":"Roohum Jegan, R. Jayagowri","doi":"10.1016/j.health.2024.100369","DOIUrl":null,"url":null,"abstract":"<div><div>This study presents an automated noninvasive voice disorder detection and classification approach using an optimized fusion of modified glottal source estimation and deep transfer learning neural network descriptors. A new set of modified descriptors based on a glottal source estimator and pre-trained Inception-ResNet-v2 convolutional neural network-based features are proposed for the speech disorder detection and classification task. The modified feature set is obtained using mel-cepstral coefficients, harmonic model, phase discrimination means, distortion deviation descriptors, conventional wavelet, and glottal source estimation features. Early descriptor-level fusion is employed in this study for performance enhancement-however, the fusion results in higher feature vector dimensionality. A nature-inspired slime mould algorithm is utilized to remove redundant and select the best discriminating features. Finally, the classification is performed using the K-nearest neighbor (KNN) classifier. The proposed algorithm was evaluated using extensive experiments with different feature combinations, with and without feature selection, and with two popular datasets: the Arabic Voice Pathology Database (AVPD) and the Saarbrucken Voice Database (SVD). We show that the proposed optimized fusion method attained an enhanced voice pathology detection accuracy of 98.46%, encompassing a wide spectrum of voice disorders on the SVD database. Furthermore, compared to traditional handcrafted and deep neural network-based techniques, the proposed method demonstrates competitive performance with fewer features.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"6 ","pages":"Article 100369"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare analytics (New York, N.Y.)","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772442524000716","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This study presents an automated noninvasive voice disorder detection and classification approach using an optimized fusion of modified glottal source estimation and deep transfer learning neural network descriptors. A new set of modified descriptors based on a glottal source estimator and pre-trained Inception-ResNet-v2 convolutional neural network-based features are proposed for the speech disorder detection and classification task. The modified feature set is obtained using mel-cepstral coefficients, harmonic model, phase discrimination means, distortion deviation descriptors, conventional wavelet, and glottal source estimation features. Early descriptor-level fusion is employed in this study for performance enhancement-however, the fusion results in higher feature vector dimensionality. A nature-inspired slime mould algorithm is utilized to remove redundant and select the best discriminating features. Finally, the classification is performed using the K-nearest neighbor (KNN) classifier. The proposed algorithm was evaluated using extensive experiments with different feature combinations, with and without feature selection, and with two popular datasets: the Arabic Voice Pathology Database (AVPD) and the Saarbrucken Voice Database (SVD). We show that the proposed optimized fusion method attained an enhanced voice pathology detection accuracy of 98.46%, encompassing a wide spectrum of voice disorders on the SVD database. Furthermore, compared to traditional handcrafted and deep neural network-based techniques, the proposed method demonstrates competitive performance with fewer features.

查看原文本刊更多论文

优化了语音病理检测和分类的手工和深度学习描述符的早期融合

本研究提出了一种基于改进声门源估计和深度迁移学习神经网络描述符的优化融合的自动无创语音障碍检测和分类方法。提出了一套基于声门源估计器和预训练的Inception-ResNet-v2卷积神经网络特征的改进描述符，用于语音障碍检测和分类任务。改进后的特征集使用了梅尔倒谱系数、谐波模型、鉴相方法、失真偏差描述子、常规小波和声门源估计特征。本研究采用早期描述符级融合来提高性能，然而，融合导致更高的特征向量维数。利用自然启发的黏菌算法去除冗余并选择最佳判别特征。最后，使用k -最近邻（KNN）分类器执行分类。采用不同的特征组合、有和没有特征选择以及两个流行的数据集：阿拉伯语语音病理数据库（AVPD）和Saarbrucken语音数据库（SVD）对所提出的算法进行了广泛的实验评估。我们的研究表明，所提出的优化融合方法获得了98.46%的语音病理检测准确率，涵盖了SVD数据库中广泛的语音疾病。此外，与传统的手工制作和基于深度神经网络的技术相比，该方法具有较少的特征，具有竞争力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊