CSI-GEP: A GPU-based unsupervised machine learning approach for recovering gene expression programs in atlas-scale single-cell RNA-seq data.

IF 11.1 Q1 CELL BIOLOGY
Xueying Liu, Richard H Chapple, Declan Bennett, William C Wright, Ankita Sanjali, Erielle Culp, Yinwen Zhang, Min Pan, Paul Geeleher
{"title":"CSI-GEP: A GPU-based unsupervised machine learning approach for recovering gene expression programs in atlas-scale single-cell RNA-seq data.","authors":"Xueying Liu, Richard H Chapple, Declan Bennett, William C Wright, Ankita Sanjali, Erielle Culp, Yinwen Zhang, Min Pan, Paul Geeleher","doi":"10.1016/j.xgen.2024.100739","DOIUrl":null,"url":null,"abstract":"<p><p>Exploratory analysis of single-cell RNA sequencing (scRNA-seq) typically relies on hard clustering over two-dimensional projections like uniform manifold approximation and projection (UMAP). However, such methods can severely distort the data and have many arbitrary parameter choices. Methods that can model scRNA-seq data as non-discrete \"gene expression programs\" (GEPs) can better preserve the data's structure, but currently, they are often not scalable, not consistent across repeated runs, and lack an established method for choosing key parameters. Here, we developed a GPU-based unsupervised learning approach, \"consensus and scalable inference of gene expression programs\" (CSI-GEP). We show that CSI-GEP can recover ground truth GEPs in real and simulated atlas-scale scRNA-seq datasets, significantly outperforming cutting-edge methods, including GPT-based neural networks. We applied CSI-GEP to a whole mouse brain atlas of 2.2 million cells, disentangling endothelial cell types missed by other methods, and to an integrated scRNA-seq atlas of human tumors and cell lines, discovering mesenchymal-like GEPs unique to cancer cells growing in culture.</p>","PeriodicalId":72539,"journal":{"name":"Cell genomics","volume":"5 1","pages":"100739"},"PeriodicalIF":11.1000,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11770216/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cell genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.xgen.2024.100739","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CELL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Exploratory analysis of single-cell RNA sequencing (scRNA-seq) typically relies on hard clustering over two-dimensional projections like uniform manifold approximation and projection (UMAP). However, such methods can severely distort the data and have many arbitrary parameter choices. Methods that can model scRNA-seq data as non-discrete "gene expression programs" (GEPs) can better preserve the data's structure, but currently, they are often not scalable, not consistent across repeated runs, and lack an established method for choosing key parameters. Here, we developed a GPU-based unsupervised learning approach, "consensus and scalable inference of gene expression programs" (CSI-GEP). We show that CSI-GEP can recover ground truth GEPs in real and simulated atlas-scale scRNA-seq datasets, significantly outperforming cutting-edge methods, including GPT-based neural networks. We applied CSI-GEP to a whole mouse brain atlas of 2.2 million cells, disentangling endothelial cell types missed by other methods, and to an integrated scRNA-seq atlas of human tumors and cell lines, discovering mesenchymal-like GEPs unique to cancer cells growing in culture.

CSI-GEP:一种基于gpu的无监督机器学习方法,用于恢复atlas级单细胞RNA-seq数据中的基因表达程序。
单细胞RNA测序(scRNA-seq)的探索性分析通常依赖于均匀流形近似和投影(UMAP)等二维投影上的硬聚类。然而,这种方法会严重扭曲数据,并且有许多任意的参数选择。将scRNA-seq数据建模为非离散“基因表达程序”(gep)的方法可以更好地保存数据的结构,但目前,它们通常不可扩展,在重复运行中不一致,并且缺乏既定的选择关键参数的方法。在这里,我们开发了一种基于gpu的无监督学习方法,“基因表达程序的共识和可扩展推理”(CSI-GEP)。研究表明,CSI-GEP可以在真实和模拟的atlas尺度scRNA-seq数据集中恢复地面真实gep,显著优于包括基于gpt的神经网络在内的尖端方法。我们将CSI-GEP应用于220万个细胞的全小鼠脑图谱,分离了其他方法遗漏的内皮细胞类型,并将其应用于人类肿瘤和细胞系的集成scRNA-seq图谱,发现了癌细胞在培养中生长所特有的间充质样gep。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.10
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信