{"title":"Marker genes identification and prediction of Parkinson's disease by integrating blood-based multi-omics data","authors":"Jisha Augustine, A.S. Jereesh","doi":"10.1016/j.chemolab.2025.105478","DOIUrl":null,"url":null,"abstract":"<div><div>Parkinson's disease (PD) is a rapidly progressing neurodegenerative disease marked by a combination of motor and non-motor symptoms. The molecular mechanism of PD remains unexplained, and there is currently no genetic risk factor with clinically proven reliability. Therefore, diagnosing PD has relied chiefly on analyzing brain images and clinical tests. Understanding the molecular-level mechanism of PD is challenging, primarily due to the complexities involved in sampling the posterior brains of both typical individuals and those with PD; however, several independent research have recently produced and assessed extensive omics data obtained from blood samples, making the diagnosis cheap and less invasive. Therefore, developing diagnostic or predictive methods for PD utilizing these data is necessary. In addition, integrating omics data can serve as a valuable asset for a comprehensive understanding of the disease. This research devised a computational approach to predict PD by integrating gene expression and DNA methylation datasets. The significant challenges were the high dimensionality and heterogeneous data sources. A two-level statistical approach is proposed to identify Differentially expressed and Methylated Genes. Archimedes Optimization Algorithm, a meta-heuristic algorithm, selects 17 optimal genes and 18 mapping CpG sites. A clustering-based method is proposed to integrate the heterogeneous omics data. Predictions of PD and healthy samples are performed using the Tabnet classification model. The proposed approach demonstrated an ROC-AUC of 0.7615 and an F1-score of 0.7325 on test data. The significance of our work is supported by biological analysis and assessment metrics.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"265 ","pages":"Article 105478"},"PeriodicalIF":3.7000,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169743925001637","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Parkinson's disease (PD) is a rapidly progressing neurodegenerative disease marked by a combination of motor and non-motor symptoms. The molecular mechanism of PD remains unexplained, and there is currently no genetic risk factor with clinically proven reliability. Therefore, diagnosing PD has relied chiefly on analyzing brain images and clinical tests. Understanding the molecular-level mechanism of PD is challenging, primarily due to the complexities involved in sampling the posterior brains of both typical individuals and those with PD; however, several independent research have recently produced and assessed extensive omics data obtained from blood samples, making the diagnosis cheap and less invasive. Therefore, developing diagnostic or predictive methods for PD utilizing these data is necessary. In addition, integrating omics data can serve as a valuable asset for a comprehensive understanding of the disease. This research devised a computational approach to predict PD by integrating gene expression and DNA methylation datasets. The significant challenges were the high dimensionality and heterogeneous data sources. A two-level statistical approach is proposed to identify Differentially expressed and Methylated Genes. Archimedes Optimization Algorithm, a meta-heuristic algorithm, selects 17 optimal genes and 18 mapping CpG sites. A clustering-based method is proposed to integrate the heterogeneous omics data. Predictions of PD and healthy samples are performed using the Tabnet classification model. The proposed approach demonstrated an ROC-AUC of 0.7615 and an F1-score of 0.7325 on test data. The significance of our work is supported by biological analysis and assessment metrics.
期刊介绍:
Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines.
Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data.
The journal deals with the following topics:
1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.)
2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered.
3) Development of new software that provides novel tools or truly advances the use of chemometrical methods.
4) Well characterized data sets to test performance for the new methods and software.
The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.