Analysis and evaluation of different sequencing depths from 5 to 20 million reads in shotgun metagenomic sequencing, with optimal minimum depth being recommended.

IF 2.3 3区 生物学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY
Genome Pub Date : 2022-09-01 Epub Date: 2022-09-06 DOI:10.1139/gen-2021-0120
Jin Liu, Xiaokai Wang, Hailiang Xie, Qinghua Zhong, Yan Xia
{"title":"Analysis and evaluation of different sequencing depths from 5 to 20 million reads in shotgun metagenomic sequencing, with optimal minimum depth being recommended.","authors":"Jin Liu,&nbsp;Xiaokai Wang,&nbsp;Hailiang Xie,&nbsp;Qinghua Zhong,&nbsp;Yan Xia","doi":"10.1139/gen-2021-0120","DOIUrl":null,"url":null,"abstract":"<p><p>Our study was to analyze and evaluate the impact of different shotgun metagenomic sequencing depths from 5 to 20 million in metagenome-wide association studies (MWASs), and to determine the optimal minimum sequencing depth. We included a set of 200 previously published gut microbial shotgun metagenomic sequencing data on obesity (100 obese vs. 100 non-obese). The reads with original sequencing depths >20 million were downsized into seven experimental groups with depths from 5 to 20 million (interval 2.5 million). Using both integrated gene cluster (IGC) and metagenomic phylogenetic analysis 2 (MetaPhlAn2), we obtained and analyzed the read matching rates, gene count, species richness and abundance, diversity, and clinical biomarkers of the experimental groups with the original depth as the control group. An additional set of 100 published data from a colorectal cancer (CRC) study was included for validation (50 CRC vs. 50 CRC-free). Our results showed that more genes and species were identified following the increase in sequencing depths. When it reached 15 million or higher, the species richness became more stable with changing rate of 5% or lower, and the species composition more stable with ICC intraclass correlation coefficient (ICC) higher than 0.75. In terms of species abundance, 81% and 97% of species showed significant differences in IGC and MetaPhlAn2 among all groups with <i>p</i> < 0.05. Diversity showed significant differences across all groups, with decreasing differences of diversity between the experimental and the control groups following the increase in sequencing depth. The area under a receiver operating characteristic curve, AUC, of the obesity classifier for running the obesity testing samples showed an increasing trend following the increase in sequencing depth (<i>τ</i> = 0.29). The validation results were consistent with the above results. Our study found that the higher the sequencing depth is, the more the microbial information in structure and composition it provides. We also found that when sequencing depth was 15 million or higher, we obtained more stable species compositions and disease classifiers with good performance. Therefore, we recommend 15 million as the optimal minimum sequencing depth for an MWAS.</p>","PeriodicalId":12809,"journal":{"name":"Genome","volume":"65 9","pages":"491-504"},"PeriodicalIF":2.3000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1139/gen-2021-0120","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/9/6 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 2

Abstract

Our study was to analyze and evaluate the impact of different shotgun metagenomic sequencing depths from 5 to 20 million in metagenome-wide association studies (MWASs), and to determine the optimal minimum sequencing depth. We included a set of 200 previously published gut microbial shotgun metagenomic sequencing data on obesity (100 obese vs. 100 non-obese). The reads with original sequencing depths >20 million were downsized into seven experimental groups with depths from 5 to 20 million (interval 2.5 million). Using both integrated gene cluster (IGC) and metagenomic phylogenetic analysis 2 (MetaPhlAn2), we obtained and analyzed the read matching rates, gene count, species richness and abundance, diversity, and clinical biomarkers of the experimental groups with the original depth as the control group. An additional set of 100 published data from a colorectal cancer (CRC) study was included for validation (50 CRC vs. 50 CRC-free). Our results showed that more genes and species were identified following the increase in sequencing depths. When it reached 15 million or higher, the species richness became more stable with changing rate of 5% or lower, and the species composition more stable with ICC intraclass correlation coefficient (ICC) higher than 0.75. In terms of species abundance, 81% and 97% of species showed significant differences in IGC and MetaPhlAn2 among all groups with p < 0.05. Diversity showed significant differences across all groups, with decreasing differences of diversity between the experimental and the control groups following the increase in sequencing depth. The area under a receiver operating characteristic curve, AUC, of the obesity classifier for running the obesity testing samples showed an increasing trend following the increase in sequencing depth (τ = 0.29). The validation results were consistent with the above results. Our study found that the higher the sequencing depth is, the more the microbial information in structure and composition it provides. We also found that when sequencing depth was 15 million or higher, we obtained more stable species compositions and disease classifiers with good performance. Therefore, we recommend 15 million as the optimal minimum sequencing depth for an MWAS.

霰弹枪宏基因组测序中5 ~ 2000万reads不同测序深度的分析与评价,推荐最佳最小深度。
本研究旨在分析和评估不同的散弹枪宏基因组测序深度(500 - 2000万)对宏基因组关联研究(MWASs)的影响,并确定最佳最小测序深度。我们纳入了一组200个先前发表的关于肥胖的肠道微生物散弹枪宏基因组测序数据(100个肥胖与100个非肥胖)。原始测序深度> 2000万的reads被缩减为7个实验组,深度从500万到2000万(间隔250万)。利用整合基因聚类(IGC)和宏基因组系统发育分析2 (MetaPhlAn2),以原始深度为对照组,获得并分析了实验组的reads匹配率、基因数量、物种丰富度和丰度、多样性和临床生物标志物。另外一组来自结直肠癌(CRC)研究的100个已发表数据被纳入验证(50个结直肠癌vs 50个无结直肠癌)。结果表明,随着测序深度的增加,可以识别出更多的基因和物种。当物种丰富度达到1500万或更高时,物种丰富度更加稳定,变化率为5%或更低,物种组成更加稳定,ICC类内相关系数(ICC)大于0.75。在物种丰度方面,81%和97%的物种在IGC和MetaPhlAn2上存在显著差异(p τ = 0.29)。验证结果与上述结果一致。我们的研究发现,测序深度越高,提供的微生物结构和组成信息越多。我们还发现,当测序深度为1500万或更高时,我们获得了更稳定的物种组成和性能良好的疾病分类器。因此,我们推荐1500万作为MWAS的最佳最小测序深度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Genome
Genome 生物-生物工程与应用微生物
CiteScore
5.30
自引率
3.20%
发文量
42
审稿时长
6-12 weeks
期刊介绍: Genome is a monthly journal, established in 1959, that publishes original research articles, reviews, mini-reviews, current opinions, and commentaries. Areas of interest include general genetics and genomics, cytogenetics, molecular and evolutionary genetics, developmental genetics, population genetics, phylogenomics, molecular identification, as well as emerging areas such as ecological, comparative, and functional genomics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信