MSDFEN: Multi-scale dynamic feature extraction network for pathological voice detection

IF 3.4 2区物理与天体物理 Q1 ACOUSTICS

Applied Acoustics Pub Date : 2024-11-28 DOI:10.1016/j.apacoust.2024.110438

Zhiyuan Dai, Yuyang Jiang, Laiyuan Cao, Xiaojun Zhang, Zhi Tao

{"title":"MSDFEN: Multi-scale dynamic feature extraction network for pathological voice detection","authors":"Zhiyuan Dai, Yuyang Jiang, Laiyuan Cao, Xiaojun Zhang, Zhi Tao","doi":"10.1016/j.apacoust.2024.110438","DOIUrl":null,"url":null,"abstract":"<div><div>With the rapid development of deep learning methods, their application in pathological voice detection has become increasingly extensive, yielding promising results. However, most deep learning methods used in pathological voice detection employ network architectures where all constituent modules are static, performing the same operations on all inputs. This significantly limits the model’s adaptive capacity and generalization ability and restricts further improvement in model performance. To address this issue, this paper proposes a novel pathological voice detection system called the Multi-Scale Dynamic Feature Extraction Network (MSDFEN), designed to enhance the performance and adaptive capability of pathological voice detection systems. In the MSDFEN model, sinc filter banks combined with a channel attention mechanism were employed for the preprocessing of vocal signals, effectively capturing the high-frequency transitions characteristic of pathological voices. Furthermore, dynamic blocks, consisting of multiple dynamic components, were designed and integrated into a multi-scale convolutional neural network, significantly enhancing the network’s dynamic performance and enriching the features obtained through multi-scale fusion. Comparative experiments and ablation studies were conducted using three internationally recognized pathological voice detection databases: MEEI, SVD, and HUPA. the proposed model achieved recognition accuracies of 0.9883, 0.7424, and 0.8409 in these databases, and other parameters also yielded satisfactory results. Experimental results indicate that the proposed method exhibits excellent adaptive and generalization capabilities in pathological voice detection.</div></div>","PeriodicalId":55506,"journal":{"name":"Applied Acoustics","volume":"230 ","pages":"Article 110438"},"PeriodicalIF":3.4000,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Acoustics","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0003682X24005899","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

Abstract

With the rapid development of deep learning methods, their application in pathological voice detection has become increasingly extensive, yielding promising results. However, most deep learning methods used in pathological voice detection employ network architectures where all constituent modules are static, performing the same operations on all inputs. This significantly limits the model’s adaptive capacity and generalization ability and restricts further improvement in model performance. To address this issue, this paper proposes a novel pathological voice detection system called the Multi-Scale Dynamic Feature Extraction Network (MSDFEN), designed to enhance the performance and adaptive capability of pathological voice detection systems. In the MSDFEN model, sinc filter banks combined with a channel attention mechanism were employed for the preprocessing of vocal signals, effectively capturing the high-frequency transitions characteristic of pathological voices. Furthermore, dynamic blocks, consisting of multiple dynamic components, were designed and integrated into a multi-scale convolutional neural network, significantly enhancing the network’s dynamic performance and enriching the features obtained through multi-scale fusion. Comparative experiments and ablation studies were conducted using three internationally recognized pathological voice detection databases: MEEI, SVD, and HUPA. the proposed model achieved recognition accuracies of 0.9883, 0.7424, and 0.8409 in these databases, and other parameters also yielded satisfactory results. Experimental results indicate that the proposed method exhibits excellent adaptive and generalization capabilities in pathological voice detection.

查看原文本刊更多论文

用于病理语音检测的多尺度动态特征提取网络

随着深度学习方法的快速发展，其在病理语音检测中的应用越来越广泛，取得了可喜的成果。然而，病理语音检测中使用的大多数深度学习方法都采用网络架构，其中所有组成模块都是静态的，对所有输入执行相同的操作。这极大地限制了模型的自适应能力和泛化能力，制约了模型性能的进一步提高。为了解决这一问题，本文提出了一种新的病理语音检测系统，称为多尺度动态特征提取网络（MSDFEN），旨在提高病理语音检测系统的性能和自适应能力。在MSDFEN模型中，采用结合通道注意机制的sinc滤波器组对语音信号进行预处理，有效捕获病理语音的高频过渡特征。此外，设计了由多个动态分量组成的动态块，并将其集成到多尺度卷积神经网络中，显著提高了网络的动态性能，丰富了多尺度融合得到的特征。采用MEEI、SVD和HUPA三种国际公认的病理语音检测数据库进行对比实验和消融研究。该模型在这些数据库中的识别准确率分别为0.9883、0.7424和0.8409，其他参数也取得了满意的结果。实验结果表明，该方法在病理语音检测中具有良好的自适应能力和泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Acoustics 物理-声学

CiteScore

7.40

自引率

11.80%

发文量

618

审稿时长

7.5 months

期刊介绍： Since its launch in 1968, Applied Acoustics has been publishing high quality research papers providing state-of-the-art coverage of research findings for engineers and scientists involved in applications of acoustics in the widest sense. Applied Acoustics looks not only at recent developments in the understanding of acoustics but also at ways of exploiting that understanding. The Journal aims to encourage the exchange of practical experience through publication and in so doing creates a fund of technological information that can be used for solving related problems. The presentation of information in graphical or tabular form is especially encouraged. If a report of a mathematical development is a necessary part of a paper it is important to ensure that it is there only as an integral part of a practical solution to a problem and is supported by data. Applied Acoustics encourages the exchange of practical experience in the following ways: • Complete Papers • Short Technical Notes • Review Articles; and thereby provides a wealth of technological information that can be used to solve related problems. Manuscripts that address all fields of applications of acoustics ranging from medicine and NDT to the environment and buildings are welcome.