Optimizing Support Vector Machine Performance for Parkinson's Disease Diagnosis Using GridSearchCV and PCA-Based Feature Extraction

Jumanto Jumanto, Rofik Rofik, E. Sugiharti, A. Alamsyah, R. Arifudin, Budi Prasetiyo, M. A. Muslim
{"title":"Optimizing Support Vector Machine Performance for Parkinson's Disease Diagnosis Using GridSearchCV and PCA-Based Feature Extraction","authors":"Jumanto Jumanto, Rofik Rofik, E. Sugiharti, A. Alamsyah, R. Arifudin, Budi Prasetiyo, M. A. Muslim","doi":"10.20473/jisebi.10.1.38-50","DOIUrl":null,"url":null,"abstract":"Background: Parkinson's disease (PD) is a critical neurodegenerative disorder affecting the central nervous system and often causing impaired movement and cognitive function in patients. In addition, its diagnosis in the early stages requires a complex and time-consuming process because all existing tests such as electroencephalography or blood examinations lack effectiveness and accuracy. Several studies explored PD prediction using sound, with a specific focus on the development of classification models to enhance accuracy. The majority of these neglected crucial aspects including feature extraction and proper parameter tuning, leading to low accuracy.\nObjective: This study aims to optimize performance of voice-based PD prediction through feature extraction, with the goal of reducing data dimensions and improving model computational efficiency. Additionally, appropriate parameters will be selected for enhancement of the ability of the model to identify both PD cases and healthy individuals.\nMethods: The proposed new model applied an OpenML dataset comprising voice recordings from 31 individuals, namely 23 PD patients and 8 healthy participants. The experimental process included the initial use of the SVM algorithm, followed by implementing PCA for feature extraction to enhance machine learning accuracy. Subsequently, data balancing with SMOTE was conducted, and GridSearchCV was used to identify the best parameter combination based on the predicted model characteristics. \nResult: Evaluation of the proposed model showed an impressive accuracy of 97.44%, sensitivity of 100%, and specificity of 85.71%. This excellent result was achieved with a limited dataset and a 10-fold cross-validation tuning, rendering the model sensitive to the training data.\nConclusion: This study successfully enhanced the prediction model accuracy through the SVM+PCA+GridSearchCV+CV method. However, future investigations should consider an appropriate number of folds for a small dataset, explore alternative cross-validation methods, and expand the dataset to enhance model generalizability.\n \nKeywords: GridSearchCV, Parkinson Disaese, SVM, PCA, SMOTE, Voice/Speech","PeriodicalId":16185,"journal":{"name":"Journal of Information Systems Engineering and Business Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Systems Engineering and Business Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20473/jisebi.10.1.38-50","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Parkinson's disease (PD) is a critical neurodegenerative disorder affecting the central nervous system and often causing impaired movement and cognitive function in patients. In addition, its diagnosis in the early stages requires a complex and time-consuming process because all existing tests such as electroencephalography or blood examinations lack effectiveness and accuracy. Several studies explored PD prediction using sound, with a specific focus on the development of classification models to enhance accuracy. The majority of these neglected crucial aspects including feature extraction and proper parameter tuning, leading to low accuracy. Objective: This study aims to optimize performance of voice-based PD prediction through feature extraction, with the goal of reducing data dimensions and improving model computational efficiency. Additionally, appropriate parameters will be selected for enhancement of the ability of the model to identify both PD cases and healthy individuals. Methods: The proposed new model applied an OpenML dataset comprising voice recordings from 31 individuals, namely 23 PD patients and 8 healthy participants. The experimental process included the initial use of the SVM algorithm, followed by implementing PCA for feature extraction to enhance machine learning accuracy. Subsequently, data balancing with SMOTE was conducted, and GridSearchCV was used to identify the best parameter combination based on the predicted model characteristics.  Result: Evaluation of the proposed model showed an impressive accuracy of 97.44%, sensitivity of 100%, and specificity of 85.71%. This excellent result was achieved with a limited dataset and a 10-fold cross-validation tuning, rendering the model sensitive to the training data. Conclusion: This study successfully enhanced the prediction model accuracy through the SVM+PCA+GridSearchCV+CV method. However, future investigations should consider an appropriate number of folds for a small dataset, explore alternative cross-validation methods, and expand the dataset to enhance model generalizability.   Keywords: GridSearchCV, Parkinson Disaese, SVM, PCA, SMOTE, Voice/Speech
利用 GridSearchCV 和基于 PCA 的特征提取优化支持向量机诊断帕金森病的性能
背景:帕金森病(Parkinson's disease,PD)是一种影响中枢神经系统的严重神经退行性疾病,通常会导致患者的运动和认知功能受损。此外,由于脑电图或血液检查等所有现有检测方法都缺乏有效性和准确性,因此早期诊断帕金森病需要一个复杂而耗时的过程。有几项研究探讨了利用声音预测帕金森氏症,并特别关注开发分类模型以提高准确性。这些研究大多忽视了包括特征提取和适当参数调整在内的关键环节,导致准确率较低:本研究旨在通过特征提取优化基于声音的 PD 预测性能,从而达到减少数据维数和提高模型计算效率的目的。此外,还将选择适当的参数,以提高模型识别脊髓灰质炎病例和健康人的能力:所提议的新模型应用了一个 OpenML 数据集,该数据集由 31 人的语音记录组成,其中包括 23 名帕金森病患者和 8 名健康参与者。实验过程包括首先使用 SVM 算法,然后使用 PCA 进行特征提取,以提高机器学习的准确性。随后,使用 SMOTE 进行数据平衡,并使用 GridSearchCV 根据预测的模型特征确定最佳参数组合。结果对提出的模型进行的评估显示,其准确率达到了令人印象深刻的 97.44%,灵敏度为 100%,特异性为 85.71%。这一优异成绩是在有限的数据集和 10 倍交叉验证调整的情况下取得的,因此模型对训练数据非常敏感:本研究通过 SVM+PCA+GridSearchCV+CV 方法成功提高了预测模型的准确性。然而,未来的研究应考虑小数据集的适当折叠数,探索其他交叉验证方法,并扩大数据集以增强模型的普适性。关键词GridSearchCV 帕金森病 SVM PCA SMOTE 语音/语音
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
0.30
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信