Time Series Classification of Raw Voice Waveforms for Parkinson's Disease Detection Using Generative Adversarial Network-Driven Data Augmentation

IEEE Open Journal of the Computer Society Pub Date : 2024-11-22 DOI:10.1109/OJCS.2024.3504864

Marta Rey-Paredes;Carlos J. Pérez;Alfonso Mateos-Caballero

{"title":"Time Series Classification of Raw Voice Waveforms for Parkinson's Disease Detection Using Generative Adversarial Network-Driven Data Augmentation","authors":"Marta Rey-Paredes;Carlos J. Pérez;Alfonso Mateos-Caballero","doi":"10.1109/OJCS.2024.3504864","DOIUrl":null,"url":null,"abstract":"Parkinson's disease (PD) is a neurodegenerative disorder that affects more than 10 million people worldwide. Despite its prevalence, the detection of PD remains a complicated task, as no gold standard test has yet been developed to provide an accurate diagnosis. In this context, many recent studies have focused on the automatic detection and progression tracking of PD from voice-related characteristics, being feature engineering the most common approach. This work intends to address an existing research gap by introducing a novel strategy that analyzes raw voice waveforms. Despite recent advancements, one of the significant hurdles is still the lack of extensive and diverse datasets. This article also implements a data augmentation solution. Big Vocoder Slicing Adversarial Network (BigVSAN) is used to generate synthetic voice data that mimics the characteristics of real patients and healthy subjects. For the PD detection task, deep learning models such as ResNet, LSTM-FCN, InceptionTime, and CDIL-CNN are used. The experiments were performed using the speech task of sustained vowel /a/ in the PC-GITA database, which contains the recordings of healthy and PD subjects. CDIL-CNN achieves the best results, improving the accuracy by 15.87% (8.96%) compared to the model that does not use augmented data (from the best method found in the literature that uses voice waveforms). The results of this study indicate that models trained with raw waveforms showcase modest but promising performance, underlying the potential of audio analysis to improve the early detection of PD, providing a non-invasive and potentially remotely applicable method.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"6 ","pages":"72-84"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10764737","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Computer Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10764737/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Parkinson's disease (PD) is a neurodegenerative disorder that affects more than 10 million people worldwide. Despite its prevalence, the detection of PD remains a complicated task, as no gold standard test has yet been developed to provide an accurate diagnosis. In this context, many recent studies have focused on the automatic detection and progression tracking of PD from voice-related characteristics, being feature engineering the most common approach. This work intends to address an existing research gap by introducing a novel strategy that analyzes raw voice waveforms. Despite recent advancements, one of the significant hurdles is still the lack of extensive and diverse datasets. This article also implements a data augmentation solution. Big Vocoder Slicing Adversarial Network (BigVSAN) is used to generate synthetic voice data that mimics the characteristics of real patients and healthy subjects. For the PD detection task, deep learning models such as ResNet, LSTM-FCN, InceptionTime, and CDIL-CNN are used. The experiments were performed using the speech task of sustained vowel /a/ in the PC-GITA database, which contains the recordings of healthy and PD subjects. CDIL-CNN achieves the best results, improving the accuracy by 15.87% (8.96%) compared to the model that does not use augmented data (from the best method found in the literature that uses voice waveforms). The results of this study indicate that models trained with raw waveforms showcase modest but promising performance, underlying the potential of audio analysis to improve the early detection of PD, providing a non-invasive and potentially remotely applicable method.

查看原文本刊更多论文

基于生成对抗网络驱动数据增强的原始语音波形时间序列分类用于帕金森病检测

帕金森氏症（PD）是一种神经退行性疾病，影响着全世界超过1000万人。尽管帕金森病很流行，但检测帕金森病仍然是一项复杂的任务，因为目前还没有金标准测试来提供准确的诊断。在此背景下，最近的许多研究都集中在从语音相关特征中自动检测和跟踪PD，这是特征工程最常用的方法。这项工作旨在通过引入一种分析原始语音波形的新策略来解决现有的研究差距。尽管最近取得了一些进展，但其中一个重大障碍仍然是缺乏广泛和多样化的数据集。本文还实现了一个数据增强解决方案。大声码切片对抗网络（BigVSAN）用于生成模拟真实患者和健康受试者特征的合成语音数据。PD检测任务使用ResNet、LSTM-FCN、InceptionTime、cdi - cnn等深度学习模型。实验使用PC-GITA数据库中的持续元音/a/语音任务进行，该数据库包含健康受试者和PD受试者的录音。cdi - cnn取得了最好的结果，与不使用增强数据的模型（来自文献中使用语音波形的最佳方法）相比，准确率提高了15.87%（8.96%）。本研究的结果表明，用原始波形训练的模型表现出适度但有希望的性能，这是音频分析提高PD早期检测的潜力，提供了一种非侵入性和潜在的远程应用方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Open Journal of the Computer Society

CiteScore

12.60

自引率

0.00%

发文量