Yongjie Xu, Zelin Zang, Bozhen Hu, Yue Yuan, Cheng Tan, Jun Xia, Stan Z Li
{"title":"Complex hierarchical structures analysis in single-cell data with Poincaré deep manifold transformation.","authors":"Yongjie Xu, Zelin Zang, Bozhen Hu, Yue Yuan, Cheng Tan, Jun Xia, Stan Z Li","doi":"10.1093/bib/bbae687","DOIUrl":null,"url":null,"abstract":"<p><p>Single-cell RNA sequencing (scRNA-seq) offers remarkable insights into cellular development and differentiation by capturing the gene expression profiles of individual cells. The role of dimensionality reduction and visualization in the interpretation of scRNA-seq data has gained widely acceptance. However, current methods face several challenges, including incomplete structure-preserving strategies and high distortion in embeddings, which fail to effectively model complex cell trajectories with multiple branches. To address these issues, we propose the Poincaré deep manifold transformation (PoincaréDMT) method, which maps high-dimensional scRNA-seq data to a hyperbolic Poincaré disk. This approach preserves global structure from a graph Laplacian matrix while achieving local structure correction through a structure module combined with data augmentation. Additionally, PoincaréDMT alleviates batch effects by integrating a batch graph that accounts for batch labels into the low-dimensional embeddings during network training. Furthermore, PoincaréDMT introduces the Shapley additive explanations method based on trained model to identify the important marker genes in specific clusters and cell differentiation process. Therefore, PoincaréDMT provides a unified framework for multiple key tasks essential for scRNA-seq analysis, including trajectory inference, pseudotime inference, batch correction, and marker gene selection. We validate PoincaréDMT through extensive evaluations on both simulated and real scRNA-seq datasets, demonstrating its superior performance in preserving global and local data structures compared to existing methods.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11757945/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbae687","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Single-cell RNA sequencing (scRNA-seq) offers remarkable insights into cellular development and differentiation by capturing the gene expression profiles of individual cells. The role of dimensionality reduction and visualization in the interpretation of scRNA-seq data has gained widely acceptance. However, current methods face several challenges, including incomplete structure-preserving strategies and high distortion in embeddings, which fail to effectively model complex cell trajectories with multiple branches. To address these issues, we propose the Poincaré deep manifold transformation (PoincaréDMT) method, which maps high-dimensional scRNA-seq data to a hyperbolic Poincaré disk. This approach preserves global structure from a graph Laplacian matrix while achieving local structure correction through a structure module combined with data augmentation. Additionally, PoincaréDMT alleviates batch effects by integrating a batch graph that accounts for batch labels into the low-dimensional embeddings during network training. Furthermore, PoincaréDMT introduces the Shapley additive explanations method based on trained model to identify the important marker genes in specific clusters and cell differentiation process. Therefore, PoincaréDMT provides a unified framework for multiple key tasks essential for scRNA-seq analysis, including trajectory inference, pseudotime inference, batch correction, and marker gene selection. We validate PoincaréDMT through extensive evaluations on both simulated and real scRNA-seq datasets, demonstrating its superior performance in preserving global and local data structures compared to existing methods.
期刊介绍:
Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data.
The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.