{"title":"MicroCellClust 2: a hybrid approach for multivariate rare cell mining in large-scale single-cell data","authors":"Alexander Gerniers, P. Dupont","doi":"10.1109/BIBM55620.2022.9995176","DOIUrl":null,"url":null,"abstract":"Identifying rare subpopulations in single-cell data is a key aspect when analyzing its heterogeneity. With large datasets now commonly generated, the focus went to scalability when designing rare cell mining methods, often relying on univariate approaches. Yet, MicroCellClust, an approach based on a multivariate optimization problem, has proven effective to jointly identify rare cells and specific genes in small-scale data. The proposed solver had a quadratic complexity, posing a practical limit to analyzing small or middle-scale data. Here, we present a new approach that scales MicroCellClust to larger datasets. It first performs a beam search among cells that are identified as rare to find an initial approximation. Then it uses simulated annealing, a classical derivative-free optimization algorithm which efficiently approaches the optimal solution. MicroCellClust 2 has a linear complexity in terms of the number of cells, which makes it scalable to large data (typically containing over 100000 cells). Our experiments report the identification of rare megakaryocytes within 68000 PBMCs, and rare ependymal cells within 160000 mouse brain cells. These results show that MicroCellClust 2 is more effective at identifying a subpopulation as a whole than typical alternatives, demonstrating the usefulness of jointly selecting cells and genes as opposed to other approaches.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM55620.2022.9995176","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Identifying rare subpopulations in single-cell data is a key aspect when analyzing its heterogeneity. With large datasets now commonly generated, the focus went to scalability when designing rare cell mining methods, often relying on univariate approaches. Yet, MicroCellClust, an approach based on a multivariate optimization problem, has proven effective to jointly identify rare cells and specific genes in small-scale data. The proposed solver had a quadratic complexity, posing a practical limit to analyzing small or middle-scale data. Here, we present a new approach that scales MicroCellClust to larger datasets. It first performs a beam search among cells that are identified as rare to find an initial approximation. Then it uses simulated annealing, a classical derivative-free optimization algorithm which efficiently approaches the optimal solution. MicroCellClust 2 has a linear complexity in terms of the number of cells, which makes it scalable to large data (typically containing over 100000 cells). Our experiments report the identification of rare megakaryocytes within 68000 PBMCs, and rare ependymal cells within 160000 mouse brain cells. These results show that MicroCellClust 2 is more effective at identifying a subpopulation as a whole than typical alternatives, demonstrating the usefulness of jointly selecting cells and genes as opposed to other approaches.