Shibiao Wan, Junil Kim, Yiping Fan, Kyoung-Jae Won
{"title":"夏普处理数百万个单细胞","authors":"Shibiao Wan, Junil Kim, Yiping Fan, Kyoung-Jae Won","doi":"10.1145/3388440.3414214","DOIUrl":null,"url":null,"abstract":"Single-cell technologies have received extensive attention from bioinformatics and computational biology communities due to their evolutionary impacts on uncovering novel cell types and intra-population heterogeneity in various domains of biology and medicine. Recent advances on single-cell RNA-sequencing (scRNA-seq) technologies have enabled parallel transcriptomic profiling of millions of cells. However, existing scRNA-seq clustering methods are lack of scalability, time-consuming and prone to information loss during dimension reduction. To address these concerns, we present SHARP [1], an ensemble random projection-based algorithm which is scalable to clustering 10 million cells. By adopting a divide-and-conquer strategy, a sparse random projection and two-layer meta-clustering, SHARP has the following advantages: (1) hyper-faster than existing algorithms; (2) scalable to 10-million cells; (3) accurate in terms of clustering performance; (4) preserving cell-to-cell distance during dimension reduction; and (5) robust to dropouts in scRNA-seq data. Comprehensive benchmarking tests on 20 scRNA-seq datasets demonstrate SHARP remarkably outperforms state-of-the-art methods in terms of speed and accuracy. To the best of our knowledge, SHARP is the only R-based tool that is scalable to clustering 10 million cells. With an avalanche of single cells in different tissues to be sequenced in multiple international projects like The Human Cell Atlas, we believe SHARP will serve as one of the useful and important tools for large-scale single-cell data analysis. Several potential future directions include while keeping the scalability and speed of SHARP, how to extend its functions into rare cell type detection and integrating single cell data from different platforms, experiments and conditions.","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"102 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Processing Millions of Single Cells by SHARP\",\"authors\":\"Shibiao Wan, Junil Kim, Yiping Fan, Kyoung-Jae Won\",\"doi\":\"10.1145/3388440.3414214\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Single-cell technologies have received extensive attention from bioinformatics and computational biology communities due to their evolutionary impacts on uncovering novel cell types and intra-population heterogeneity in various domains of biology and medicine. Recent advances on single-cell RNA-sequencing (scRNA-seq) technologies have enabled parallel transcriptomic profiling of millions of cells. However, existing scRNA-seq clustering methods are lack of scalability, time-consuming and prone to information loss during dimension reduction. To address these concerns, we present SHARP [1], an ensemble random projection-based algorithm which is scalable to clustering 10 million cells. By adopting a divide-and-conquer strategy, a sparse random projection and two-layer meta-clustering, SHARP has the following advantages: (1) hyper-faster than existing algorithms; (2) scalable to 10-million cells; (3) accurate in terms of clustering performance; (4) preserving cell-to-cell distance during dimension reduction; and (5) robust to dropouts in scRNA-seq data. Comprehensive benchmarking tests on 20 scRNA-seq datasets demonstrate SHARP remarkably outperforms state-of-the-art methods in terms of speed and accuracy. To the best of our knowledge, SHARP is the only R-based tool that is scalable to clustering 10 million cells. With an avalanche of single cells in different tissues to be sequenced in multiple international projects like The Human Cell Atlas, we believe SHARP will serve as one of the useful and important tools for large-scale single-cell data analysis. Several potential future directions include while keeping the scalability and speed of SHARP, how to extend its functions into rare cell type detection and integrating single cell data from different platforms, experiments and conditions.\",\"PeriodicalId\":411338,\"journal\":{\"name\":\"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics\",\"volume\":\"102 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3388440.3414214\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3388440.3414214","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Single-cell technologies have received extensive attention from bioinformatics and computational biology communities due to their evolutionary impacts on uncovering novel cell types and intra-population heterogeneity in various domains of biology and medicine. Recent advances on single-cell RNA-sequencing (scRNA-seq) technologies have enabled parallel transcriptomic profiling of millions of cells. However, existing scRNA-seq clustering methods are lack of scalability, time-consuming and prone to information loss during dimension reduction. To address these concerns, we present SHARP [1], an ensemble random projection-based algorithm which is scalable to clustering 10 million cells. By adopting a divide-and-conquer strategy, a sparse random projection and two-layer meta-clustering, SHARP has the following advantages: (1) hyper-faster than existing algorithms; (2) scalable to 10-million cells; (3) accurate in terms of clustering performance; (4) preserving cell-to-cell distance during dimension reduction; and (5) robust to dropouts in scRNA-seq data. Comprehensive benchmarking tests on 20 scRNA-seq datasets demonstrate SHARP remarkably outperforms state-of-the-art methods in terms of speed and accuracy. To the best of our knowledge, SHARP is the only R-based tool that is scalable to clustering 10 million cells. With an avalanche of single cells in different tissues to be sequenced in multiple international projects like The Human Cell Atlas, we believe SHARP will serve as one of the useful and important tools for large-scale single-cell data analysis. Several potential future directions include while keeping the scalability and speed of SHARP, how to extend its functions into rare cell type detection and integrating single cell data from different platforms, experiments and conditions.