{"title":"基于图融合的单细胞RNA-seq数据多视图聚类。","authors":"Jing Wang, Junfeng Xia, Dayu Tan, Yunjie Ma, Yansen Su, Chun-Hou Zheng","doi":"10.1093/bib/bbaf193","DOIUrl":null,"url":null,"abstract":"<p><p>Single-cell RNA sequencing (scRNA-seq) provides transcriptome profiling of individual cells, allowing for in-depth studies of cell heterogeneity at cell resolution. While cell clustering lays the basic foundation of scRNA-seq data analysis, the high-dimensionality and frequent dropout events of the data raise great challenges. Although plenty of dedicated clustering methods have been proposed, they often fail to fully explore the underlying data structure. Here, we introduce scMCGF, a new multi-view clustering algorithm based on graph fusion. It utilizes multi-view data generated from transcriptomic data to learn the consistent and complementary information across different view, ultimately constructing a unified graph matrix for robust cell clustering. Specifically, scMCGF utilizes two-dimensional-reduction methods (principal component analysis and diffusion maps) to capture both linear and non-linear characteristics of the data. Additionally, it calculates a cell-pathway score matrix to incorporate pathway-level information. These three features, along with the pre-processed gene expression data, form the multi-view data. scMCGF iteratively refines the structure of similarity graphs of each view through adaptive learning and learns a unified graph matrix by weighting and fusing the individual similarity graph matrix. The final clustering results are obtained by applying the rank constraint on the Laplacian matrix of the unified graph matrix. Experiments results of 13 real data sets reveal that scMCGF outperforms eight state-of-the-art methods in clustering accuracy and robustness. Furthermore, biological analysis validates that the clustering results of scMCGF provide a reliable foundation for downstream investigations.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12103903/pdf/","citationCount":"0","resultStr":"{\"title\":\"Multi-view clustering for single-cell RNA-seq data based on graph fusion.\",\"authors\":\"Jing Wang, Junfeng Xia, Dayu Tan, Yunjie Ma, Yansen Su, Chun-Hou Zheng\",\"doi\":\"10.1093/bib/bbaf193\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Single-cell RNA sequencing (scRNA-seq) provides transcriptome profiling of individual cells, allowing for in-depth studies of cell heterogeneity at cell resolution. While cell clustering lays the basic foundation of scRNA-seq data analysis, the high-dimensionality and frequent dropout events of the data raise great challenges. Although plenty of dedicated clustering methods have been proposed, they often fail to fully explore the underlying data structure. Here, we introduce scMCGF, a new multi-view clustering algorithm based on graph fusion. It utilizes multi-view data generated from transcriptomic data to learn the consistent and complementary information across different view, ultimately constructing a unified graph matrix for robust cell clustering. Specifically, scMCGF utilizes two-dimensional-reduction methods (principal component analysis and diffusion maps) to capture both linear and non-linear characteristics of the data. Additionally, it calculates a cell-pathway score matrix to incorporate pathway-level information. These three features, along with the pre-processed gene expression data, form the multi-view data. scMCGF iteratively refines the structure of similarity graphs of each view through adaptive learning and learns a unified graph matrix by weighting and fusing the individual similarity graph matrix. The final clustering results are obtained by applying the rank constraint on the Laplacian matrix of the unified graph matrix. Experiments results of 13 real data sets reveal that scMCGF outperforms eight state-of-the-art methods in clustering accuracy and robustness. Furthermore, biological analysis validates that the clustering results of scMCGF provide a reliable foundation for downstream investigations.</p>\",\"PeriodicalId\":9209,\"journal\":{\"name\":\"Briefings in bioinformatics\",\"volume\":\"26 3\",\"pages\":\"\"},\"PeriodicalIF\":6.8000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12103903/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Briefings in bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/bib/bbaf193\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf193","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
Multi-view clustering for single-cell RNA-seq data based on graph fusion.
Single-cell RNA sequencing (scRNA-seq) provides transcriptome profiling of individual cells, allowing for in-depth studies of cell heterogeneity at cell resolution. While cell clustering lays the basic foundation of scRNA-seq data analysis, the high-dimensionality and frequent dropout events of the data raise great challenges. Although plenty of dedicated clustering methods have been proposed, they often fail to fully explore the underlying data structure. Here, we introduce scMCGF, a new multi-view clustering algorithm based on graph fusion. It utilizes multi-view data generated from transcriptomic data to learn the consistent and complementary information across different view, ultimately constructing a unified graph matrix for robust cell clustering. Specifically, scMCGF utilizes two-dimensional-reduction methods (principal component analysis and diffusion maps) to capture both linear and non-linear characteristics of the data. Additionally, it calculates a cell-pathway score matrix to incorporate pathway-level information. These three features, along with the pre-processed gene expression data, form the multi-view data. scMCGF iteratively refines the structure of similarity graphs of each view through adaptive learning and learns a unified graph matrix by weighting and fusing the individual similarity graph matrix. The final clustering results are obtained by applying the rank constraint on the Laplacian matrix of the unified graph matrix. Experiments results of 13 real data sets reveal that scMCGF outperforms eight state-of-the-art methods in clustering accuracy and robustness. Furthermore, biological analysis validates that the clustering results of scMCGF provide a reliable foundation for downstream investigations.
期刊介绍:
Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data.
The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.