Vulture: cloud-enabled scalable mining of microbial reads in public scRNA-seq data.

IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES
Junyi Chen, Danqing Yin, Harris Y H Wong, Xin Duan, Ken H O Yu, Joshua W K Ho
{"title":"Vulture: cloud-enabled scalable mining of microbial reads in public scRNA-seq data.","authors":"Junyi Chen, Danqing Yin, Harris Y H Wong, Xin Duan, Ken H O Yu, Joshua W K Ho","doi":"10.1093/gigascience/giad117","DOIUrl":null,"url":null,"abstract":"<p><p>The rapidly growing collection of public single-cell sequencing data has become a valuable resource for molecular, cellular, and microbial discovery. Previous studies mostly overlooked detecting pathogens in human single-cell sequencing data. Moreover, existing bioinformatics tools lack the scalability to deal with big public data. We introduce Vulture, a scalable cloud-based pipeline that performs microbial calling for single-cell RNA sequencing (scRNA-seq) data, enabling meta-analysis of host-microbial studies from the public domain. In our benchmarking experiments, Vulture is 66% to 88% faster than local tools (PathogenTrack and Venus) and 41% faster than the state-of-the-art cloud-based tool Cumulus, while achieving comparable microbial read identification. In terms of the cost on cloud computing systems, Vulture also shows a cost reduction of 83% ($12 vs. ${\\$}$70). We applied Vulture to 2 coronavirus disease 2019, 3 hepatocellular carcinoma (HCC), and 2 gastric cancer human patient cohorts with public sequencing reads data from scRNA-seq experiments and discovered cell type-specific enrichment of severe acute respiratory syndrome coronavirus 2, hepatitis B virus (HBV), and Helicobacter pylori-positive cells, respectively. In the HCC analysis, all cohorts showed hepatocyte-only enrichment of HBV, with cell subtype-associated HBV enrichment based on inferred copy number variations. In summary, Vulture presents a scalable and economical framework to mine unknown host-microbial interactions from large-scale public scRNA-seq data. Vulture is available via an open-source license at https://github.com/holab-hku/Vulture.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8000,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10776309/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giad117","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

The rapidly growing collection of public single-cell sequencing data has become a valuable resource for molecular, cellular, and microbial discovery. Previous studies mostly overlooked detecting pathogens in human single-cell sequencing data. Moreover, existing bioinformatics tools lack the scalability to deal with big public data. We introduce Vulture, a scalable cloud-based pipeline that performs microbial calling for single-cell RNA sequencing (scRNA-seq) data, enabling meta-analysis of host-microbial studies from the public domain. In our benchmarking experiments, Vulture is 66% to 88% faster than local tools (PathogenTrack and Venus) and 41% faster than the state-of-the-art cloud-based tool Cumulus, while achieving comparable microbial read identification. In terms of the cost on cloud computing systems, Vulture also shows a cost reduction of 83% ($12 vs. ${\$}$70). We applied Vulture to 2 coronavirus disease 2019, 3 hepatocellular carcinoma (HCC), and 2 gastric cancer human patient cohorts with public sequencing reads data from scRNA-seq experiments and discovered cell type-specific enrichment of severe acute respiratory syndrome coronavirus 2, hepatitis B virus (HBV), and Helicobacter pylori-positive cells, respectively. In the HCC analysis, all cohorts showed hepatocyte-only enrichment of HBV, with cell subtype-associated HBV enrichment based on inferred copy number variations. In summary, Vulture presents a scalable and economical framework to mine unknown host-microbial interactions from large-scale public scRNA-seq data. Vulture is available via an open-source license at https://github.com/holab-hku/Vulture.

Vulture:通过云技术对公共 scRNA-seq 数据中的微生物读数进行可扩展的挖掘。
快速增长的公共单细胞测序数据已成为分子、细胞和微生物发现的宝贵资源。以往的研究大多忽视了在人类单细胞测序数据中检测病原体。此外,现有的生物信息学工具缺乏处理大型公共数据的可扩展性。我们介绍了 Vulture,这是一种基于云的可扩展管道,可对单细胞 RNA 测序(scRNA-seq)数据进行微生物调用,从而对来自公共领域的宿主-微生物研究进行荟萃分析。在我们的基准实验中,Vulture的速度比本地工具(PathogenTrack和Venus)快66%到88%,比最先进的云计算工具Cumulus快41%,同时实现了相当的微生物读数识别。就云计算系统的成本而言,Vulture 的成本也降低了 83%(12 美元对 70 美元)。我们将Vulture应用于2个2019年冠状病毒疾病、3个肝细胞癌(HCC)和2个胃癌人类患者队列的scRNA-seq实验公共测序读数数据,分别发现了严重急性呼吸综合征冠状病毒2、乙型肝炎病毒(HBV)和幽门螺旋杆菌阳性细胞的细胞特异性富集。在 HCC 分析中,所有队列都显示出仅肝细胞的 HBV 富集,而细胞亚型相关的 HBV 富集是基于推断的拷贝数变异。总之,Vulture 提供了一个可扩展且经济的框架,可从大规模公共 scRNA-seq 数据中挖掘未知的宿主-微生物相互作用。Vulture 可通过 https://github.com/holab-hku/Vulture 的开源许可获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
GigaScience
GigaScience MULTIDISCIPLINARY SCIENCES-
CiteScore
15.50
自引率
1.10%
发文量
119
审稿时长
1 weeks
期刊介绍: GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信