{"title":"scAGG:基于单核数据的阿尔茨海默病样本水平嵌入和分类。","authors":"T Verlaan, G A Bouland, A Mahfouz, M J T Reinders","doi":"10.1016/j.csbj.2025.08.009","DOIUrl":null,"url":null,"abstract":"<p><p>Identifying key cell types and genes in Alzheimer's Disease (AD) is crucial for understanding its pathogenesis and discovering therapeutic targets. Single-cell RNA sequencing technology (scRNAseq) has provided unprecedented opportunities to study the molecular mechanisms that underlie AD at the cellular level. In this study, we address the problem of sample-level classification of AD using scRNAseq data, where we predict the disease status of entire samples from the gene expression profiles of their cells, which are not necessarily all affected by the disease. We introduce scAGG (single-cell AGGregation), a sample-level classification model that uses a sample-level pooling mechanism to aggregate single-cell embeddings, and show that it can accurately classify AD individuals and healthy controls. We then investigate the latent space learnt by the model and find that the model learns an ordering of the cells corresponding to disease severity. Genes associated with this ordering are enriched in AD-linked pathways, including cytokine signalling, apoptosis, and metal ion response. We also evaluate two attention-based models that perform on par with scAGG, but entropy analysis of their attention scores reveals limited interpretability value. As scRNAseq is increasingly applied to large cohorts and cell-level disease association annotations do not exist, our approach provides a way to classify phenotypes from single-cell measurements. The yielded cell- and sample-level severity scores may enable identification of AD-associated cell subtypes, paving the way for targeted drug development and personalized treatment strategies in AD. Code is available at: https://github.com/timoverlaan/scAGG.</p>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"3753-3761"},"PeriodicalIF":4.1000,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12448040/pdf/","citationCount":"0","resultStr":"{\"title\":\"scAGG: Sample-level embedding and classification of Alzheimer's disease from single-nucleus data.\",\"authors\":\"T Verlaan, G A Bouland, A Mahfouz, M J T Reinders\",\"doi\":\"10.1016/j.csbj.2025.08.009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Identifying key cell types and genes in Alzheimer's Disease (AD) is crucial for understanding its pathogenesis and discovering therapeutic targets. Single-cell RNA sequencing technology (scRNAseq) has provided unprecedented opportunities to study the molecular mechanisms that underlie AD at the cellular level. In this study, we address the problem of sample-level classification of AD using scRNAseq data, where we predict the disease status of entire samples from the gene expression profiles of their cells, which are not necessarily all affected by the disease. We introduce scAGG (single-cell AGGregation), a sample-level classification model that uses a sample-level pooling mechanism to aggregate single-cell embeddings, and show that it can accurately classify AD individuals and healthy controls. We then investigate the latent space learnt by the model and find that the model learns an ordering of the cells corresponding to disease severity. Genes associated with this ordering are enriched in AD-linked pathways, including cytokine signalling, apoptosis, and metal ion response. We also evaluate two attention-based models that perform on par with scAGG, but entropy analysis of their attention scores reveals limited interpretability value. As scRNAseq is increasingly applied to large cohorts and cell-level disease association annotations do not exist, our approach provides a way to classify phenotypes from single-cell measurements. The yielded cell- and sample-level severity scores may enable identification of AD-associated cell subtypes, paving the way for targeted drug development and personalized treatment strategies in AD. Code is available at: https://github.com/timoverlaan/scAGG.</p>\",\"PeriodicalId\":10715,\"journal\":{\"name\":\"Computational and structural biotechnology journal\",\"volume\":\"27 \",\"pages\":\"3753-3761\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2025-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12448040/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational and structural biotechnology journal\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1016/j.csbj.2025.08.009\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational and structural biotechnology journal","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.csbj.2025.08.009","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
scAGG: Sample-level embedding and classification of Alzheimer's disease from single-nucleus data.
Identifying key cell types and genes in Alzheimer's Disease (AD) is crucial for understanding its pathogenesis and discovering therapeutic targets. Single-cell RNA sequencing technology (scRNAseq) has provided unprecedented opportunities to study the molecular mechanisms that underlie AD at the cellular level. In this study, we address the problem of sample-level classification of AD using scRNAseq data, where we predict the disease status of entire samples from the gene expression profiles of their cells, which are not necessarily all affected by the disease. We introduce scAGG (single-cell AGGregation), a sample-level classification model that uses a sample-level pooling mechanism to aggregate single-cell embeddings, and show that it can accurately classify AD individuals and healthy controls. We then investigate the latent space learnt by the model and find that the model learns an ordering of the cells corresponding to disease severity. Genes associated with this ordering are enriched in AD-linked pathways, including cytokine signalling, apoptosis, and metal ion response. We also evaluate two attention-based models that perform on par with scAGG, but entropy analysis of their attention scores reveals limited interpretability value. As scRNAseq is increasingly applied to large cohorts and cell-level disease association annotations do not exist, our approach provides a way to classify phenotypes from single-cell measurements. The yielded cell- and sample-level severity scores may enable identification of AD-associated cell subtypes, paving the way for targeted drug development and personalized treatment strategies in AD. Code is available at: https://github.com/timoverlaan/scAGG.
期刊介绍:
Computational and Structural Biotechnology Journal (CSBJ) is an online gold open access journal publishing research articles and reviews after full peer review. All articles are published, without barriers to access, immediately upon acceptance. The journal places a strong emphasis on functional and mechanistic understanding of how molecular components in a biological process work together through the application of computational methods. Structural data may provide such insights, but they are not a pre-requisite for publication in the journal. Specific areas of interest include, but are not limited to:
Structure and function of proteins, nucleic acids and other macromolecules
Structure and function of multi-component complexes
Protein folding, processing and degradation
Enzymology
Computational and structural studies of plant systems
Microbial Informatics
Genomics
Proteomics
Metabolomics
Algorithms and Hypothesis in Bioinformatics
Mathematical and Theoretical Biology
Computational Chemistry and Drug Discovery
Microscopy and Molecular Imaging
Nanotechnology
Systems and Synthetic Biology