An ensemble technique to predict Parkinson's disease using machine learning algorithms

IF 3 3区计算机科学 Q2 ACOUSTICS

Speech Communication Pub Date : 2024-04-01 DOI:10.1016/j.specom.2024.103067

Nutan Singh, Priyanka Tripathi

{"title":"An ensemble technique to predict Parkinson's disease using machine learning algorithms","authors":"Nutan Singh, Priyanka Tripathi","doi":"10.1016/j.specom.2024.103067","DOIUrl":null,"url":null,"abstract":"<div><p>Parkinson's Disease (PD) is a progressive neurodegenerative disorder affecting motor and non-motor symptoms. Its symptoms develop slowly, making early identification difficult. Machine learning has a significant potential to predict Parkinson's disease on features hidden in voice data. This work aimed to identify the most relevant features from a high-dimensional dataset, which helps accurately classify Parkinson's Disease with less computation time. Three individual datasets with various medical features based on voice have been analyzed in this work. An Ensemble Feature Selection Algorithm (EFSA) technique based on filter, wrapper, and embedding algorithms that pick highly relevant features for identifying Parkinson's Disease is proposed, and the same has been validated on three different datasets based on voice. These techniques can shorten training time to improve model accuracy and minimize overfitting. We utilized different ML models such as K-Nearest Neighbors (KNN), Random Forest, Decision Tree, Support Vector Machine (SVM), Bagging Classifier, Multi-Layer Perceptron (MLP) Classifier, and Gradient Boosting. Each of these models was fine-tuned to ensure optimal performance within our specific context. Moreover, in addition to these established classifiers, we proposed an ensemble classifier is found on a high optimal majority of the votes. Dataset-I achieves classification accuracy with 97.6 %, F<sub>1</sub>-score 97.9 %, precision with 98 % and recall with 98 %. Dataset-II achieves classification accuracy 90.2 %, F<sub>1</sub>-score 90.2 %, precision 90.2 %, and recall 90.5 %. Dataset-III achieves 83.3 % accuracy, F<sub>1</sub>-score 83.3 %, precision 83.5 % and recall 83.3 %. These results have been taken using 13 out of 23, 45 out of 754, and 17 out of 46 features from respective datasets. The proposed EFSA model has performed with higher accuracy and is more efficient than other models for each dataset.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"159 ","pages":"Article 103067"},"PeriodicalIF":3.0000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167639324000396","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Parkinson's Disease (PD) is a progressive neurodegenerative disorder affecting motor and non-motor symptoms. Its symptoms develop slowly, making early identification difficult. Machine learning has a significant potential to predict Parkinson's disease on features hidden in voice data. This work aimed to identify the most relevant features from a high-dimensional dataset, which helps accurately classify Parkinson's Disease with less computation time. Three individual datasets with various medical features based on voice have been analyzed in this work. An Ensemble Feature Selection Algorithm (EFSA) technique based on filter, wrapper, and embedding algorithms that pick highly relevant features for identifying Parkinson's Disease is proposed, and the same has been validated on three different datasets based on voice. These techniques can shorten training time to improve model accuracy and minimize overfitting. We utilized different ML models such as K-Nearest Neighbors (KNN), Random Forest, Decision Tree, Support Vector Machine (SVM), Bagging Classifier, Multi-Layer Perceptron (MLP) Classifier, and Gradient Boosting. Each of these models was fine-tuned to ensure optimal performance within our specific context. Moreover, in addition to these established classifiers, we proposed an ensemble classifier is found on a high optimal majority of the votes. Dataset-I achieves classification accuracy with 97.6 %, F₁-score 97.9 %, precision with 98 % and recall with 98 %. Dataset-II achieves classification accuracy 90.2 %, F₁-score 90.2 %, precision 90.2 %, and recall 90.5 %. Dataset-III achieves 83.3 % accuracy, F₁-score 83.3 %, precision 83.5 % and recall 83.3 %. These results have been taken using 13 out of 23, 45 out of 754, and 17 out of 46 features from respective datasets. The proposed EFSA model has performed with higher accuracy and is more efficient than other models for each dataset.

查看原文本刊更多论文

利用机器学习算法预测帕金森病的集合技术

帕金森病（PD）是一种进行性神经退行性疾病，影响运动和非运动症状。其症状发展缓慢，因此很难早期识别。机器学习在根据隐藏在语音数据中的特征预测帕金森病方面潜力巨大。这项工作旨在从高维数据集中找出最相关的特征，从而有助于以较少的计算时间准确地对帕金森病进行分类。这项研究分析了三个基于语音的具有各种医疗特征的数据集。本文提出了一种基于过滤器、包装器和嵌入算法的集合特征选择算法（EFSA）技术，该技术可挑选出与帕金森病识别高度相关的特征，并在三个不同的语音数据集上进行了验证。这些技术可以缩短训练时间，从而提高模型的准确性，并最大限度地减少过拟合。我们采用了不同的 ML 模型，如 K-Nearest Neighbors (KNN)、随机森林、决策树、支持向量机 (SVM)、袋式分类器、多层感知器 (MLP) 分类器和梯度提升。这些模型中的每一个都经过了微调，以确保在我们的特定情况下达到最佳性能。此外，除了这些成熟的分类器外，我们还提出了一种集合分类器，它能获得最佳多数选票。数据集 I 的分类准确率为 97.6%，F1 分数为 97.9%，精确度为 98%，召回率为 98%。数据集 II 的分类准确率为 90.2 %，F1 分数为 90.2 %，精确度为 90.2 %，召回率为 90.5 %。数据集 III 的分类准确率为 83.3%，F1 分数为 83.3%，精确度为 83.5%，召回率为 83.3%。这些结果分别使用了数据集中 23 个特征中的 13 个、754 个特征中的 45 个和 46 个特征中的 17 个。就每个数据集而言，拟议的 EFSA 模型都比其他模型具有更高的准确率和更高的效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Speech Communication 工程技术-计算机：跨学科应用

CiteScore

6.80

自引率

6.20%

发文量

审稿时长

19.2 weeks

期刊介绍： Speech Communication is an interdisciplinary journal whose primary objective is to fulfil the need for the rapid dissemination and thorough discussion of basic and applied research results. The journal''s primary objectives are: • to present a forum for the advancement of human and human-machine speech communication science; • to stimulate cross-fertilization between different fields of this domain; • to contribute towards the rapid and wide diffusion of scientifically sound contributions in this domain.