使用堆叠集成深度学习模型增强对困难言语障碍的早期检测

IF 4.9

Machine learning with applications Pub Date : 2025-08-05 DOI:10.1016/j.mlwa.2025.100721

Jagat Chaitanya Prabhala , Ravi Ragoju , Venkatanareshbabu Kuppili , Christophe Chesneau

{"title":"使用堆叠集成深度学习模型增强对困难言语障碍的早期检测","authors":"Jagat Chaitanya Prabhala , Ravi Ragoju , Venkatanareshbabu Kuppili , Christophe Chesneau","doi":"10.1016/j.mlwa.2025.100721","DOIUrl":null,"url":null,"abstract":"<div><div>Communication disorders, particularly dysarthria, significantly impact individuals by impairing their speech clarity, social interactions, and overall well-being. Early and accurate detection is crucial to enable timely intervention and improve speech therapy outcomes. This study introduces Adaptive Dysarthric Speech Disability Detection using Stacked Ensemble Deep Learning (ADSDD-SEDL), an innovative ensemble-based deep-learning framework for dysarthria detection. The proposed model integrates three deep learning architectures—Multi-Head Attention-based Long Short-Term Memory (MHALSTM), Deep Belief Network (DBN), and Time-Delay Neural Network (TDNN)—within a stacked ensemble model. Unlike conventional stacking methods that use fixed meta-classifiers, this study employs a Genetic Algorithm (GA)-based optimization strategy to dynamically determine optimal weight contributions of the base models, enhancing classification robustness and adaptability.</div><div>The preprocessing pipeline converts speech signals from the time domain to the frequency domain by using a Short-Time Fourier Transform (STFT). Mel-Frequency Cepstral Coefficients (MFCCs) were extracted to capture the key spectral characteristics. Each base model underwent independent training, and the GA optimized the ensemble by evolving an adaptive weight distribution instead of relying on predefined fusion methods. Extensive simulations and hyperparameter tuning confirmed that the GA-optimized ADSDD-SEDL technique significantly improved detection efficiency over traditional ensemble approaches. These findings underscore the advantages of evolutionary optimization in refining speech disorder classification models. This scalable and adaptive model offers a valuable tool for healthcare professionals, enabling precise and automated early diagnosis of dysarthria. Future research could explore alternative evolutionary algorithms, reinforcement learning techniques, and hybrid deep learning approaches to enhance speech disorder classification.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"21 ","pages":"Article 100721"},"PeriodicalIF":4.9000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhanced early detection of dysarthric speech disabilities using stacking ensemble deep learning model\",\"authors\":\"Jagat Chaitanya Prabhala , Ravi Ragoju , Venkatanareshbabu Kuppili , Christophe Chesneau\",\"doi\":\"10.1016/j.mlwa.2025.100721\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Communication disorders, particularly dysarthria, significantly impact individuals by impairing their speech clarity, social interactions, and overall well-being. Early and accurate detection is crucial to enable timely intervention and improve speech therapy outcomes. This study introduces Adaptive Dysarthric Speech Disability Detection using Stacked Ensemble Deep Learning (ADSDD-SEDL), an innovative ensemble-based deep-learning framework for dysarthria detection. The proposed model integrates three deep learning architectures—Multi-Head Attention-based Long Short-Term Memory (MHALSTM), Deep Belief Network (DBN), and Time-Delay Neural Network (TDNN)—within a stacked ensemble model. Unlike conventional stacking methods that use fixed meta-classifiers, this study employs a Genetic Algorithm (GA)-based optimization strategy to dynamically determine optimal weight contributions of the base models, enhancing classification robustness and adaptability.</div><div>The preprocessing pipeline converts speech signals from the time domain to the frequency domain by using a Short-Time Fourier Transform (STFT). Mel-Frequency Cepstral Coefficients (MFCCs) were extracted to capture the key spectral characteristics. Each base model underwent independent training, and the GA optimized the ensemble by evolving an adaptive weight distribution instead of relying on predefined fusion methods. Extensive simulations and hyperparameter tuning confirmed that the GA-optimized ADSDD-SEDL technique significantly improved detection efficiency over traditional ensemble approaches. These findings underscore the advantages of evolutionary optimization in refining speech disorder classification models. This scalable and adaptive model offers a valuable tool for healthcare professionals, enabling precise and automated early diagnosis of dysarthria. Future research could explore alternative evolutionary algorithms, reinforcement learning techniques, and hybrid deep learning approaches to enhance speech disorder classification.</div></div>\",\"PeriodicalId\":74093,\"journal\":{\"name\":\"Machine learning with applications\",\"volume\":\"21 \",\"pages\":\"Article 100721\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine learning with applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666827025001045\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827025001045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

沟通障碍，特别是构音障碍，通过损害他们的语言清晰度、社会互动和整体幸福感来显著影响个体。早期和准确的检测对于及时干预和改善言语治疗效果至关重要。本研究介绍了使用堆叠集成深度学习（ADSDD-SEDL）的自适应构音障碍语音障碍检测，这是一种创新的基于集成的构音障碍检测深度学习框架。该模型在堆叠集成模型中集成了三种深度学习架构-基于多头注意的长短期记忆（MHALSTM），深度信念网络（DBN）和时延神经网络（TDNN）。与传统的使用固定元分类器的叠加方法不同，本研究采用基于遗传算法（GA）的优化策略，动态确定基模型的最优权重贡献度，增强了分类的鲁棒性和自适应性。预处理管道利用短时傅里叶变换（STFT）将语音信号从时域转换到频域。提取Mel-Frequency倒谱系数（MFCCs）来捕捉关键的频谱特征。每个基本模型都经过独立训练，遗传算法通过进化自适应权重分布来优化集成，而不是依赖于预定义的融合方法。大量的仿真和超参数调谐证实，ga优化的ADSDD-SEDL技术比传统的集成方法显著提高了检测效率。这些发现强调了进化优化在改进语言障碍分类模型方面的优势。这种可扩展和自适应的模型为医疗保健专业人员提供了一个有价值的工具，可以精确和自动地早期诊断构音障碍。未来的研究可以探索替代进化算法、强化学习技术和混合深度学习方法来增强语言障碍分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhanced early detection of dysarthric speech disabilities using stacking ensemble deep learning model

Communication disorders, particularly dysarthria, significantly impact individuals by impairing their speech clarity, social interactions, and overall well-being. Early and accurate detection is crucial to enable timely intervention and improve speech therapy outcomes. This study introduces Adaptive Dysarthric Speech Disability Detection using Stacked Ensemble Deep Learning (ADSDD-SEDL), an innovative ensemble-based deep-learning framework for dysarthria detection. The proposed model integrates three deep learning architectures—Multi-Head Attention-based Long Short-Term Memory (MHALSTM), Deep Belief Network (DBN), and Time-Delay Neural Network (TDNN)—within a stacked ensemble model. Unlike conventional stacking methods that use fixed meta-classifiers, this study employs a Genetic Algorithm (GA)-based optimization strategy to dynamically determine optimal weight contributions of the base models, enhancing classification robustness and adaptability.

The preprocessing pipeline converts speech signals from the time domain to the frequency domain by using a Short-Time Fourier Transform (STFT). Mel-Frequency Cepstral Coefficients (MFCCs) were extracted to capture the key spectral characteristics. Each base model underwent independent training, and the GA optimized the ensemble by evolving an adaptive weight distribution instead of relying on predefined fusion methods. Extensive simulations and hyperparameter tuning confirmed that the GA-optimized ADSDD-SEDL technique significantly improved detection efficiency over traditional ensemble approaches. These findings underscore the advantages of evolutionary optimization in refining speech disorder classification models. This scalable and adaptive model offers a valuable tool for healthcare professionals, enabling precise and automated early diagnosis of dysarthria. Future research could explore alternative evolutionary algorithms, reinforcement learning techniques, and hybrid deep learning approaches to enhance speech disorder classification.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Machine learning with applications Management Science and Operations Research, Artificial Intelligence, Computer Science Applications

自引率

0.00%

发文量

审稿时长

98 days