Analysis and evaluation of different sequencing depths from 5 to 20 million reads in shotgun metagenomic sequencing, with optimal minimum depth being recommended.
IF 2.3 3区 生物学Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY
Jin Liu, Xiaokai Wang, Hailiang Xie, Qinghua Zhong, Yan Xia
{"title":"Analysis and evaluation of different sequencing depths from 5 to 20 million reads in shotgun metagenomic sequencing, with optimal minimum depth being recommended.","authors":"Jin Liu, Xiaokai Wang, Hailiang Xie, Qinghua Zhong, Yan Xia","doi":"10.1139/gen-2021-0120","DOIUrl":null,"url":null,"abstract":"<p><p>Our study was to analyze and evaluate the impact of different shotgun metagenomic sequencing depths from 5 to 20 million in metagenome-wide association studies (MWASs), and to determine the optimal minimum sequencing depth. We included a set of 200 previously published gut microbial shotgun metagenomic sequencing data on obesity (100 obese vs. 100 non-obese). The reads with original sequencing depths >20 million were downsized into seven experimental groups with depths from 5 to 20 million (interval 2.5 million). Using both integrated gene cluster (IGC) and metagenomic phylogenetic analysis 2 (MetaPhlAn2), we obtained and analyzed the read matching rates, gene count, species richness and abundance, diversity, and clinical biomarkers of the experimental groups with the original depth as the control group. An additional set of 100 published data from a colorectal cancer (CRC) study was included for validation (50 CRC vs. 50 CRC-free). Our results showed that more genes and species were identified following the increase in sequencing depths. When it reached 15 million or higher, the species richness became more stable with changing rate of 5% or lower, and the species composition more stable with ICC intraclass correlation coefficient (ICC) higher than 0.75. In terms of species abundance, 81% and 97% of species showed significant differences in IGC and MetaPhlAn2 among all groups with <i>p</i> < 0.05. Diversity showed significant differences across all groups, with decreasing differences of diversity between the experimental and the control groups following the increase in sequencing depth. The area under a receiver operating characteristic curve, AUC, of the obesity classifier for running the obesity testing samples showed an increasing trend following the increase in sequencing depth (<i>τ</i> = 0.29). The validation results were consistent with the above results. Our study found that the higher the sequencing depth is, the more the microbial information in structure and composition it provides. We also found that when sequencing depth was 15 million or higher, we obtained more stable species compositions and disease classifiers with good performance. Therefore, we recommend 15 million as the optimal minimum sequencing depth for an MWAS.</p>","PeriodicalId":12809,"journal":{"name":"Genome","volume":null,"pages":null},"PeriodicalIF":2.3000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1139/gen-2021-0120","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/9/6 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 2
Abstract
Our study was to analyze and evaluate the impact of different shotgun metagenomic sequencing depths from 5 to 20 million in metagenome-wide association studies (MWASs), and to determine the optimal minimum sequencing depth. We included a set of 200 previously published gut microbial shotgun metagenomic sequencing data on obesity (100 obese vs. 100 non-obese). The reads with original sequencing depths >20 million were downsized into seven experimental groups with depths from 5 to 20 million (interval 2.5 million). Using both integrated gene cluster (IGC) and metagenomic phylogenetic analysis 2 (MetaPhlAn2), we obtained and analyzed the read matching rates, gene count, species richness and abundance, diversity, and clinical biomarkers of the experimental groups with the original depth as the control group. An additional set of 100 published data from a colorectal cancer (CRC) study was included for validation (50 CRC vs. 50 CRC-free). Our results showed that more genes and species were identified following the increase in sequencing depths. When it reached 15 million or higher, the species richness became more stable with changing rate of 5% or lower, and the species composition more stable with ICC intraclass correlation coefficient (ICC) higher than 0.75. In terms of species abundance, 81% and 97% of species showed significant differences in IGC and MetaPhlAn2 among all groups with p < 0.05. Diversity showed significant differences across all groups, with decreasing differences of diversity between the experimental and the control groups following the increase in sequencing depth. The area under a receiver operating characteristic curve, AUC, of the obesity classifier for running the obesity testing samples showed an increasing trend following the increase in sequencing depth (τ = 0.29). The validation results were consistent with the above results. Our study found that the higher the sequencing depth is, the more the microbial information in structure and composition it provides. We also found that when sequencing depth was 15 million or higher, we obtained more stable species compositions and disease classifiers with good performance. Therefore, we recommend 15 million as the optimal minimum sequencing depth for an MWAS.
期刊介绍:
Genome is a monthly journal, established in 1959, that publishes original research articles, reviews, mini-reviews, current opinions, and commentaries. Areas of interest include general genetics and genomics, cytogenetics, molecular and evolutionary genetics, developmental genetics, population genetics, phylogenomics, molecular identification, as well as emerging areas such as ecological, comparative, and functional genomics.