A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing.

Genomics & informatics Pub Date : 2023-09-01 Epub Date: 2023-07-31 DOI:10.5808/gi.23044
Hyeonwoo Kim, Jiwon Kim, Ji Won Choi, Kwang-Sung Ahn, Dong-Il Park, Sangsoo Kim
{"title":"A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing.","authors":"Hyeonwoo Kim,&nbsp;Jiwon Kim,&nbsp;Ji Won Choi,&nbsp;Kwang-Sung Ahn,&nbsp;Dong-Il Park,&nbsp;Sangsoo Kim","doi":"10.5808/gi.23044","DOIUrl":null,"url":null,"abstract":"<p><p>Microbial community profiling using 16S rRNA amplicon sequencing allows for taxonomic characterization of diverse microorganisms. While amplicon sequence variant (ASV) methods are increasingly favored for their fine-grained resolution of sequence variants, they often discard substantial portions of sequencing reads during quality control, particularly in datasets with large number samples. We present a streamlined pipeline that integrates FastP for read trimming, HmmUFOtu for operational taxonomic units (OTU) clustering, Vsearch for chimera checking, and Kraken2 for taxonomic assignment. To assess the pipeline's performance, we reprocessed two published stool datasets of normal Korean populations: one with 890 and the other with 1,462 independent samples. In the first dataset, HmmUFOtu retained 93.2% of over 104 million read pairs after quality trimming, discarding chimeric or unclassifiable reads, while DADA2, a commonly used ASV method, retained only 44.6% of the reads. Nonetheless, both methods yielded qualitatively similar β-diversity plots. For the second dataset, HmmUFOtu retained 89.2% of read pairs, while DADA2 retained a mere 18.4% of the reads. HmmUFOtu, being a closed-reference clustering method, facilitates merging separately processed datasets, with shared OTUs between the two datasets exhibiting a correlation coefficient of 0.92 in total abundance (log scale). While the first two dimensions of the β-diversity plot exhibited a cohesive mixture of the two datasets, the third dimension revealed the presence of a batch effect. Our comparative evaluation of ASV and OTU methods within this streamlined pipeline provides valuable insights into their performance when processing large-scale microbial 16S rRNA amplicon sequencing data. The strengths of HmmUFOtu and its potential for dataset merging are highlighted.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"21 3","pages":"e40"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10584646/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics & informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5808/gi.23044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/7/31 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Microbial community profiling using 16S rRNA amplicon sequencing allows for taxonomic characterization of diverse microorganisms. While amplicon sequence variant (ASV) methods are increasingly favored for their fine-grained resolution of sequence variants, they often discard substantial portions of sequencing reads during quality control, particularly in datasets with large number samples. We present a streamlined pipeline that integrates FastP for read trimming, HmmUFOtu for operational taxonomic units (OTU) clustering, Vsearch for chimera checking, and Kraken2 for taxonomic assignment. To assess the pipeline's performance, we reprocessed two published stool datasets of normal Korean populations: one with 890 and the other with 1,462 independent samples. In the first dataset, HmmUFOtu retained 93.2% of over 104 million read pairs after quality trimming, discarding chimeric or unclassifiable reads, while DADA2, a commonly used ASV method, retained only 44.6% of the reads. Nonetheless, both methods yielded qualitatively similar β-diversity plots. For the second dataset, HmmUFOtu retained 89.2% of read pairs, while DADA2 retained a mere 18.4% of the reads. HmmUFOtu, being a closed-reference clustering method, facilitates merging separately processed datasets, with shared OTUs between the two datasets exhibiting a correlation coefficient of 0.92 in total abundance (log scale). While the first two dimensions of the β-diversity plot exhibited a cohesive mixture of the two datasets, the third dimension revealed the presence of a batch effect. Our comparative evaluation of ASV and OTU methods within this streamlined pipeline provides valuable insights into their performance when processing large-scale microbial 16S rRNA amplicon sequencing data. The strengths of HmmUFOtu and its potential for dataset merging are highlighted.

Abstract Image

Abstract Image

Abstract Image

使用16S rRNA扩增子测序进行微生物群落分析的基于HmUFOtu的精简管道。
使用16S rRNA扩增子测序的微生物群落图谱允许对不同微生物进行分类表征。虽然扩增子序列变体(ASV)方法因其对序列变体的细粒度分辨率而越来越受青睐,但它们通常在质量控制期间丢弃大量测序读数,尤其是在具有大量样本的数据集中。我们提出了一个简化的管道,它集成了用于读取修剪的FastP、用于操作分类单元(OTU)聚类的HmUFOtu、用于嵌合体检查的Vsearch和用于分类分配的Kraken2。为了评估管道的性能,我们重新处理了两个已发表的韩国正常人群的粪便数据集:一个有890个,另一个有1462个独立样本。在第一个数据集中,在质量调整后,HmUFOtu保留了超过1.04亿个读取对中的93.2%,丢弃了嵌合或不可分类的读取,而DADA2,一种常用的ASV方法,仅保留了44.6%的读取。尽管如此,两种方法都产生了定性相似的β-多样性图。对于第二个数据集,HMMUOtu保留了89.2%的读取对,而DADA2仅保留了18.4%的读取。作为一种封闭参考聚类方法,HMMUOtu有助于合并单独处理的数据集,两个数据集之间的共享OTU在总丰度(对数尺度)上显示出0.92的相关系数。虽然β-多样性图的前两个维度显示出两个数据集的内聚混合,但第三个维度显示了批量效应的存在。我们在这个简化的管道中对ASV和OTU方法的比较评估为它们在处理大规模微生物16S rRNA扩增子测序数据时的性能提供了有价值的见解。强调了HMMUOtu的优势及其在数据集合并方面的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信