{"title":"Efficient Clustering of Massive scRNA-seq Data Using a Modified PQk-Means Algorithm","authors":"Weinan Liu, Renyi Liu","doi":"10.1109/ICBCB52223.2021.9459216","DOIUrl":null,"url":null,"abstract":"Single cell RNA Sequencing (scRNA-seq) measures gene expressions at the single cell level, and has been widely applied in biological and medical research. An important step in scRNA-seq data analysis is to use an unsupervised clustering algorithm to partition cells into different clusters based on the similarity of their gene expression profiles, followed by assigning cell type labels to each cluster. Recent advances in scRNA-seq technologies lead to a sharp increase of data size, posing a computational challenge to commonly used clustering algorithms that are memory-demanding or computation-intensive. Here, we propose a modified PQk-means algorithm that can greatly reduce both running time and memory usage while providing similar or better partition accuracy when it was tested on real scRNA-seq datasets.","PeriodicalId":178168,"journal":{"name":"2021 IEEE 9th International Conference on Bioinformatics and Computational Biology (ICBCB)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 9th International Conference on Bioinformatics and Computational Biology (ICBCB)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBCB52223.2021.9459216","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Single cell RNA Sequencing (scRNA-seq) measures gene expressions at the single cell level, and has been widely applied in biological and medical research. An important step in scRNA-seq data analysis is to use an unsupervised clustering algorithm to partition cells into different clusters based on the similarity of their gene expression profiles, followed by assigning cell type labels to each cluster. Recent advances in scRNA-seq technologies lead to a sharp increase of data size, posing a computational challenge to commonly used clustering algorithms that are memory-demanding or computation-intensive. Here, we propose a modified PQk-means algorithm that can greatly reduce both running time and memory usage while providing similar or better partition accuracy when it was tested on real scRNA-seq datasets.