Analyzing Large-Scale Single-Cell RNA-Seq Data Using Coreset.

IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Khalid Usman, Fangping Wan, Dan Zhao, Jian Peng, Jianyang Zeng
{"title":"Analyzing Large-Scale Single-Cell RNA-Seq Data Using Coreset.","authors":"Khalid Usman, Fangping Wan, Dan Zhao, Jian Peng, Jianyang Zeng","doi":"10.1109/TCBB.2024.3418078","DOIUrl":null,"url":null,"abstract":"<p><p>The recent boom in single-cell sequencing technologies provides valuable insights into the transcriptomes of individual cells. Through single-cell data analyses, a number of biological discoveries, such as novel cell types, developmental cell lineage trajectories, and gene regulatory networks, have been uncovered. However, the massive and increasingly accumulated single-cell datasets have also posed a seriously computational and analytical challenge for researchers. To address this issue, one typically applies dimensionality reduction approaches to reduce the large-scale datasets. However, these approaches are generally computationally infeasible for tall matrices. In addition, the downstream data analysis tasks such as clustering still take a large time complexity even on the dimension-reduced datasets. We present single-cell Coreset (scCoreset), a data summarization framework that extracts a small weighted subset of cells from a huge sparse single-cell RNA-seq data to facilitate the downstream data analysis tasks. Single-cell data analyses run on the extracted subset yield similar results to those derived from the original uncompressed data. Tests on various single-cell datasets show that scCoreset outperforms the existing data summarization approaches for common downstream tasks such as visualization and clustering. We believe that scCoreset can serve as a useful plug-in tool to improve the efficiency of current single-cell RNA-seq data analyses.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6000,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/TCBB.2024.3418078","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

The recent boom in single-cell sequencing technologies provides valuable insights into the transcriptomes of individual cells. Through single-cell data analyses, a number of biological discoveries, such as novel cell types, developmental cell lineage trajectories, and gene regulatory networks, have been uncovered. However, the massive and increasingly accumulated single-cell datasets have also posed a seriously computational and analytical challenge for researchers. To address this issue, one typically applies dimensionality reduction approaches to reduce the large-scale datasets. However, these approaches are generally computationally infeasible for tall matrices. In addition, the downstream data analysis tasks such as clustering still take a large time complexity even on the dimension-reduced datasets. We present single-cell Coreset (scCoreset), a data summarization framework that extracts a small weighted subset of cells from a huge sparse single-cell RNA-seq data to facilitate the downstream data analysis tasks. Single-cell data analyses run on the extracted subset yield similar results to those derived from the original uncompressed data. Tests on various single-cell datasets show that scCoreset outperforms the existing data summarization approaches for common downstream tasks such as visualization and clustering. We believe that scCoreset can serve as a useful plug-in tool to improve the efficiency of current single-cell RNA-seq data analyses.

使用 Coreset 分析大规模单细胞 RNA-Seq 数据。
近年来,单细胞测序技术的蓬勃发展为人们提供了了解单个细胞转录组的宝贵信息。通过单细胞数据分析,人们发现了许多生物学新发现,如新型细胞类型、发育细胞系轨迹和基因调控网络等。然而,海量且日益积累的单细胞数据集也给研究人员带来了严重的计算和分析挑战。为了解决这个问题,人们通常采用降维方法来减少大规模数据集。然而,这些方法通常对高矩阵的计算不可行。此外,即使在降维后的数据集上,聚类等下游数据分析任务仍然需要耗费大量的时间复杂度。我们提出的单细胞核心集(scCoreset)是一个数据汇总框架,它能从庞大的稀疏单细胞 RNA-seq 数据中提取一小部分加权细胞子集,以方便下游数据分析任务。在提取的子集上运行单细胞数据分析,会得到与原始未压缩数据类似的结果。对各种单细胞数据集的测试表明,在可视化和聚类等常见下游任务方面,scCoreset 优于现有的数据汇总方法。我们相信,scCoreset 可以作为一种有用的插件工具,提高当前单细胞 RNA-seq 数据分析的效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.50
自引率
6.70%
发文量
479
审稿时长
3 months
期刊介绍: IEEE/ACM Transactions on Computational Biology and Bioinformatics emphasizes the algorithmic, mathematical, statistical and computational methods that are central in bioinformatics and computational biology; the development and testing of effective computer programs in bioinformatics; the development of biological databases; and important biological results that are obtained from the use of these methods, programs and databases; the emerging field of Systems Biology, where many forms of data are used to create a computer-based model of a complex biological system
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信