Yun Fan , XiaoLong Wang , Yun Ling , QiuYi Wang , XiBin Zhou , Kai Li , ChunXiang Zhou
{"title":"Identification and validation of biomarkers in Alzheimer's disease based on machine learning algorithms and single-cell sequencing analysis","authors":"Yun Fan , XiaoLong Wang , Yun Ling , QiuYi Wang , XiBin Zhou , Kai Li , ChunXiang Zhou","doi":"10.1016/j.compbiolchem.2025.108475","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>Alzheimer's disease (AD) is a complicated neurodegenerative disease with unknown pathogenesis. Identifying possible diagnostic markers of AD is essential to elucidate its mechanisms and facilitate diagnosis.</div></div><div><h3>Methods</h3><div>A total of 295 samples (153 AD and 142 normal) were analyzed from two datasets (GSE122063 and GSE132903) in the Gene Express Omnibus (GEO) database. Differentially expressed genes (DEGs) between groups were identified and dimensionality reduction was applied to identify feature genes (key genes) using three algorithms of machine learning including least absolute shrinkage and selection operator (LASSO), support vector machine-recursive feature elimination (SVM-RFE), and Random forest (RF). In addition, we obtained sample data from single-cell RNA datasets GSE157827, GSE167490, and GSE174367 to classify cells into different types and examined changes in gene expression and their correlation with AD progression. Immunofluorescence assay was used to verify the expression of key genes in animal experiments.</div></div><div><h3>Results</h3><div>To identify diagnostic genes associated with AD, we analyzed two datasets and identified 379 DEGs which might be related to the onset of AD, and 115 of them were up-regulated and 264 down-regulated. Three algorithms of machine learning were adopted to reduce the dimensions of these DEGs and finally six core DEGs CD86, SCG3, VGF, PRKCG, SPP1, and TPI1 of AD were identified. Diagnostic analyses showed that SCG3 was substantially down-regulated in the AD group, and its AUC was higher in both the training and validation sets (0.845, 0.927, and 0.917, respectively). Transcriptome sequencing results further revealed that SCG3 expression was down-regulated in multiple cell types in the AD group and SCG3 expression in the hippocampus was found significantly reduced in the AD group.</div></div><div><h3>Conclusions</h3><div>This study systematically identified and validated the potential of SCG3 as an early diagnostic biomarker for AD through several technical strategies. The findings provided new biomarkers for early detection of AD and laid a foundation for future clinical applications.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"118 ","pages":"Article 108475"},"PeriodicalIF":2.6000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Biology and Chemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1476927125001355","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective
Alzheimer's disease (AD) is a complicated neurodegenerative disease with unknown pathogenesis. Identifying possible diagnostic markers of AD is essential to elucidate its mechanisms and facilitate diagnosis.
Methods
A total of 295 samples (153 AD and 142 normal) were analyzed from two datasets (GSE122063 and GSE132903) in the Gene Express Omnibus (GEO) database. Differentially expressed genes (DEGs) between groups were identified and dimensionality reduction was applied to identify feature genes (key genes) using three algorithms of machine learning including least absolute shrinkage and selection operator (LASSO), support vector machine-recursive feature elimination (SVM-RFE), and Random forest (RF). In addition, we obtained sample data from single-cell RNA datasets GSE157827, GSE167490, and GSE174367 to classify cells into different types and examined changes in gene expression and their correlation with AD progression. Immunofluorescence assay was used to verify the expression of key genes in animal experiments.
Results
To identify diagnostic genes associated with AD, we analyzed two datasets and identified 379 DEGs which might be related to the onset of AD, and 115 of them were up-regulated and 264 down-regulated. Three algorithms of machine learning were adopted to reduce the dimensions of these DEGs and finally six core DEGs CD86, SCG3, VGF, PRKCG, SPP1, and TPI1 of AD were identified. Diagnostic analyses showed that SCG3 was substantially down-regulated in the AD group, and its AUC was higher in both the training and validation sets (0.845, 0.927, and 0.917, respectively). Transcriptome sequencing results further revealed that SCG3 expression was down-regulated in multiple cell types in the AD group and SCG3 expression in the hippocampus was found significantly reduced in the AD group.
Conclusions
This study systematically identified and validated the potential of SCG3 as an early diagnostic biomarker for AD through several technical strategies. The findings provided new biomarkers for early detection of AD and laid a foundation for future clinical applications.
期刊介绍:
Computational Biology and Chemistry publishes original research papers and review articles in all areas of computational life sciences. High quality research contributions with a major computational component in the areas of nucleic acid and protein sequence research, molecular evolution, molecular genetics (functional genomics and proteomics), theory and practice of either biology-specific or chemical-biology-specific modeling, and structural biology of nucleic acids and proteins are particularly welcome. Exceptionally high quality research work in bioinformatics, systems biology, ecology, computational pharmacology, metabolism, biomedical engineering, epidemiology, and statistical genetics will also be considered.
Given their inherent uncertainty, protein modeling and molecular docking studies should be thoroughly validated. In the absence of experimental results for validation, the use of molecular dynamics simulations along with detailed free energy calculations, for example, should be used as complementary techniques to support the major conclusions. Submissions of premature modeling exercises without additional biological insights will not be considered.
Review articles will generally be commissioned by the editors and should not be submitted to the journal without explicit invitation. However prospective authors are welcome to send a brief (one to three pages) synopsis, which will be evaluated by the editors.