Mustafa Noaman Kadhim , Dhiah Al-Shammary , Ahmed M. Mahdi , Ayman Ibaida
{"title":"Feature selection based on Mahalanobis distance for early Parkinson disease classification","authors":"Mustafa Noaman Kadhim , Dhiah Al-Shammary , Ahmed M. Mahdi , Ayman Ibaida","doi":"10.1016/j.cmpbup.2025.100177","DOIUrl":null,"url":null,"abstract":"<div><div>Standard classifiers struggle with high-dimensional datasets due to increased computational complexity, difficulty in visualization and interpretation, and challenges in handling redundant or irrelevant features. This paper proposes a novel feature selection method based on the Mahalanobis distance for Parkinson's disease (PD) classification. The proposed feature selection identifies relevant features by measuring their distance from the dataset's mean vector, considering the covariance structure. Features with larger Mahalanobis distances are deemed more relevant as they exhibit greater discriminative power relative to the dataset's distribution, aiding in effective feature subset selection. Significant improvements in classification performance were observed across all models. On the \"Parkinson Disease Classification Dataset\", the feature set was reduced from 22 to 11 features, resulting in accuracy improvements ranging from 10.17 % to 20.34 %, with the K-Nearest Neighbors (KNN) classifier achieving the highest accuracy of 98.31 %. Similarly, on the \"Parkinson Dataset with Replicated Acoustic Features\", the feature set was reduced from 45 to 18 features, achieving accuracy improvements ranging from 1.38 % to 13.88 %, with the Random Forest (RF) classifier achieving the best accuracy of 95.83 %. By identifying convergence features and eliminating divergence features, the proposed method effectively reduces dimensionality while maintaining or improving classifier performance. Additionally, the proposed feature selection method significantly reduces execution time, making it highly suitable for real-time applications in medical diagnostics, where timely and accurate disease identification is critical for improving patient outcomes.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"7 ","pages":"Article 100177"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine update","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666990025000011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Feature selection based on Mahalanobis distance for early Parkinson disease classification
Standard classifiers struggle with high-dimensional datasets due to increased computational complexity, difficulty in visualization and interpretation, and challenges in handling redundant or irrelevant features. This paper proposes a novel feature selection method based on the Mahalanobis distance for Parkinson's disease (PD) classification. The proposed feature selection identifies relevant features by measuring their distance from the dataset's mean vector, considering the covariance structure. Features with larger Mahalanobis distances are deemed more relevant as they exhibit greater discriminative power relative to the dataset's distribution, aiding in effective feature subset selection. Significant improvements in classification performance were observed across all models. On the "Parkinson Disease Classification Dataset", the feature set was reduced from 22 to 11 features, resulting in accuracy improvements ranging from 10.17 % to 20.34 %, with the K-Nearest Neighbors (KNN) classifier achieving the highest accuracy of 98.31 %. Similarly, on the "Parkinson Dataset with Replicated Acoustic Features", the feature set was reduced from 45 to 18 features, achieving accuracy improvements ranging from 1.38 % to 13.88 %, with the Random Forest (RF) classifier achieving the best accuracy of 95.83 %. By identifying convergence features and eliminating divergence features, the proposed method effectively reduces dimensionality while maintaining or improving classifier performance. Additionally, the proposed feature selection method significantly reduces execution time, making it highly suitable for real-time applications in medical diagnostics, where timely and accurate disease identification is critical for improving patient outcomes.