固氮微生物的全球生物地理学:nifH 扩增子数据库和分析工作流程

IF 11.2 1区 地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY
Michael Morando, Jonathan Magasin, Shunyan Cheung, Matthew M. Mills, Jonathan P. Zehr, Kendra A. Turk-Kubo
{"title":"固氮微生物的全球生物地理学:nifH 扩增子数据库和分析工作流程","authors":"Michael Morando, Jonathan Magasin, Shunyan Cheung, Matthew M. Mills, Jonathan P. Zehr, Kendra A. Turk-Kubo","doi":"10.5194/essd-2024-163","DOIUrl":null,"url":null,"abstract":"<strong>Abstract.</strong> Marine nitrogen (N) fixation is a globally significant biogeochemical process carried out by a specialized group of prokaryotes (diazotrophs), yet our understanding of their ecology is constantly evolving. Although marine dinitrogen (N<sub>2</sub>)-fixation is often ascribed to cyanobacterial diazotrophs, indirect evidence suggests that non-cyanobacterial diazotrophs (NCDs) might also be important. One widely used approach for understanding diazotroph diversity and biogeography is polymerase chain reaction (PCR)-amplification of a portion of the <em>nifH</em> gene, which encodes a structural component of the N<sub>2</sub>-fixing enzyme complex, nitrogenase. An array of bioinformatic tools exists to process <em>nifH</em> amplicon data, however, the lack of standardized practices has hindered cross-study comparisons. This has led to a missed opportunity to more thoroughly assess diazotroph biogeography, diversity, and their potential contributions to the marine N cycle. To address these knowledge gaps a bioinformatic workflow was designed that standardizes the processing of <em>nifH</em> amplicon datasets originating from high-throughput sequencing (HTS). Multiple datasets are efficiently and consistently processed with a specialized DADA2 pipeline to identify amplicon sequence variants (ASVs). A series of customizable post-pipeline stages then detect and discard spurious <em>nifH</em> sequences and annotate the subsequent quality-filtered <em>nifH</em> ASVs using multiple reference databases and classification approaches. This newly developed workflow was used to reprocess nearly all publicly available <em>nifH</em> amplicon HTS datasets from marine studies, and to generate a comprehensive <em>nifH</em> ASV database containing 7909 ASVs aggregated from 21 studies that represent the diazotrophic populations in the global ocean. For each sample, the database includes physical and chemical metadata obtained from the Simons Collaborative Marine Atlas Project (CMAP). Here we demonstrate the utility of this database for revealing global biogeographical patterns of prominent diazotroph groups and highlight the influence of sea surface temperature. The workflow and <em>nifH</em> ASV database provide a robust framework for studying marine N<sub>2</sub> fixation and diazotrophic diversity captured by <em>nifH</em> amplicon HTS. Future datasets that target understudied ocean regions can be added easily, and users can tune parameters and studies included for their specific focus. The workflow and database are available, respectively, in GitHub (https://github.com/jdmagasin/nifH-ASV-workflow; Morando et al., 2024) and Figshare (https://doi.org/10.6084/m9.figshare.23795943.v1; Morando et al., 2024).","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":null,"pages":null},"PeriodicalIF":11.2000,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Global biogeography of N2-fixing microbes: nifH amplicon database and analytics workflow\",\"authors\":\"Michael Morando, Jonathan Magasin, Shunyan Cheung, Matthew M. Mills, Jonathan P. Zehr, Kendra A. Turk-Kubo\",\"doi\":\"10.5194/essd-2024-163\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<strong>Abstract.</strong> Marine nitrogen (N) fixation is a globally significant biogeochemical process carried out by a specialized group of prokaryotes (diazotrophs), yet our understanding of their ecology is constantly evolving. Although marine dinitrogen (N<sub>2</sub>)-fixation is often ascribed to cyanobacterial diazotrophs, indirect evidence suggests that non-cyanobacterial diazotrophs (NCDs) might also be important. One widely used approach for understanding diazotroph diversity and biogeography is polymerase chain reaction (PCR)-amplification of a portion of the <em>nifH</em> gene, which encodes a structural component of the N<sub>2</sub>-fixing enzyme complex, nitrogenase. An array of bioinformatic tools exists to process <em>nifH</em> amplicon data, however, the lack of standardized practices has hindered cross-study comparisons. This has led to a missed opportunity to more thoroughly assess diazotroph biogeography, diversity, and their potential contributions to the marine N cycle. To address these knowledge gaps a bioinformatic workflow was designed that standardizes the processing of <em>nifH</em> amplicon datasets originating from high-throughput sequencing (HTS). Multiple datasets are efficiently and consistently processed with a specialized DADA2 pipeline to identify amplicon sequence variants (ASVs). A series of customizable post-pipeline stages then detect and discard spurious <em>nifH</em> sequences and annotate the subsequent quality-filtered <em>nifH</em> ASVs using multiple reference databases and classification approaches. This newly developed workflow was used to reprocess nearly all publicly available <em>nifH</em> amplicon HTS datasets from marine studies, and to generate a comprehensive <em>nifH</em> ASV database containing 7909 ASVs aggregated from 21 studies that represent the diazotrophic populations in the global ocean. For each sample, the database includes physical and chemical metadata obtained from the Simons Collaborative Marine Atlas Project (CMAP). Here we demonstrate the utility of this database for revealing global biogeographical patterns of prominent diazotroph groups and highlight the influence of sea surface temperature. The workflow and <em>nifH</em> ASV database provide a robust framework for studying marine N<sub>2</sub> fixation and diazotrophic diversity captured by <em>nifH</em> amplicon HTS. Future datasets that target understudied ocean regions can be added easily, and users can tune parameters and studies included for their specific focus. The workflow and database are available, respectively, in GitHub (https://github.com/jdmagasin/nifH-ASV-workflow; Morando et al., 2024) and Figshare (https://doi.org/10.6084/m9.figshare.23795943.v1; Morando et al., 2024).\",\"PeriodicalId\":48747,\"journal\":{\"name\":\"Earth System Science Data\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":11.2000,\"publicationDate\":\"2024-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Earth System Science Data\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://doi.org/10.5194/essd-2024-163\",\"RegionNum\":1,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GEOSCIENCES, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Earth System Science Data","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.5194/essd-2024-163","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

摘要海洋固氮(N)是一个具有全球意义的生物地球化学过程,由一组专门的原核生物(重氮营养体)进行,但我们对其生态学的了解却在不断发展。尽管海洋二氮(N2)固定过程通常是由蓝藻重氮营养体完成的,但间接证据表明,非蓝藻重氮营养体(NCD)可能也很重要。为了解重氮营养体的多样性和生物地理学,一种广泛使用的方法是聚合酶链式反应(PCR)--扩增 nifH 基因的一部分。目前有一系列生物信息学工具可用于处理 nifH 扩增子数据,但由于缺乏标准化方法,妨碍了跨研究比较。这导致我们错失了更全面地评估重氮营养生物地理学、多样性及其对海洋氮循环的潜在贡献的机会。为了填补这些知识空白,我们设计了一套生物信息学工作流程,对源自高通量测序(HTS)的 nifH 扩增子数据集进行标准化处理。利用专门的 DADA2 管道对多个数据集进行高效一致的处理,以识别扩增子序列变异(ASV)。然后,一系列可定制的后管道阶段会检测并剔除虚假的 nifH 序列,并利用多个参考数据库和分类方法对随后经过质量过滤的 nifH ASV 进行注释。这个新开发的工作流程被用于重新处理海洋研究中几乎所有公开的 nifH 扩增子 HTS 数据集,并生成一个全面的 nifH ASV 数据库,其中包含从 21 项研究中汇总的 7909 个 ASV,这些研究代表了全球海洋中的重氮营养种群。对于每个样本,数据库都包含从西蒙斯海洋图集合作项目(CMAP)中获得的物理和化学元数据。在此,我们展示了该数据库在揭示全球主要重氮营养群生物地理模式方面的实用性,并强调了海面温度的影响。该工作流程和 nifH ASV 数据库为研究海洋 N2 固定和 nifH 扩增子 HTS 捕获的重氮营养体多样性提供了一个强大的框架。未来,针对研究不足的海洋区域的数据集可以很容易地添加进来,用户可以根据自己的具体侧重点调整参数和研究内容。工作流程和数据库可分别在 GitHub (https://github.com/jdmagasin/nifH-ASV-workflow; Morando et al., 2024) 和 Figshare (https://doi.org/10.6084/m9.figshare.23795943.v1; Morando et al., 2024) 上查阅。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Global biogeography of N2-fixing microbes: nifH amplicon database and analytics workflow
Abstract. Marine nitrogen (N) fixation is a globally significant biogeochemical process carried out by a specialized group of prokaryotes (diazotrophs), yet our understanding of their ecology is constantly evolving. Although marine dinitrogen (N2)-fixation is often ascribed to cyanobacterial diazotrophs, indirect evidence suggests that non-cyanobacterial diazotrophs (NCDs) might also be important. One widely used approach for understanding diazotroph diversity and biogeography is polymerase chain reaction (PCR)-amplification of a portion of the nifH gene, which encodes a structural component of the N2-fixing enzyme complex, nitrogenase. An array of bioinformatic tools exists to process nifH amplicon data, however, the lack of standardized practices has hindered cross-study comparisons. This has led to a missed opportunity to more thoroughly assess diazotroph biogeography, diversity, and their potential contributions to the marine N cycle. To address these knowledge gaps a bioinformatic workflow was designed that standardizes the processing of nifH amplicon datasets originating from high-throughput sequencing (HTS). Multiple datasets are efficiently and consistently processed with a specialized DADA2 pipeline to identify amplicon sequence variants (ASVs). A series of customizable post-pipeline stages then detect and discard spurious nifH sequences and annotate the subsequent quality-filtered nifH ASVs using multiple reference databases and classification approaches. This newly developed workflow was used to reprocess nearly all publicly available nifH amplicon HTS datasets from marine studies, and to generate a comprehensive nifH ASV database containing 7909 ASVs aggregated from 21 studies that represent the diazotrophic populations in the global ocean. For each sample, the database includes physical and chemical metadata obtained from the Simons Collaborative Marine Atlas Project (CMAP). Here we demonstrate the utility of this database for revealing global biogeographical patterns of prominent diazotroph groups and highlight the influence of sea surface temperature. The workflow and nifH ASV database provide a robust framework for studying marine N2 fixation and diazotrophic diversity captured by nifH amplicon HTS. Future datasets that target understudied ocean regions can be added easily, and users can tune parameters and studies included for their specific focus. The workflow and database are available, respectively, in GitHub (https://github.com/jdmagasin/nifH-ASV-workflow; Morando et al., 2024) and Figshare (https://doi.org/10.6084/m9.figshare.23795943.v1; Morando et al., 2024).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Earth System Science Data
Earth System Science Data GEOSCIENCES, MULTIDISCIPLINARYMETEOROLOGY-METEOROLOGY & ATMOSPHERIC SCIENCES
CiteScore
18.00
自引率
5.30%
发文量
231
审稿时长
35 weeks
期刊介绍: Earth System Science Data (ESSD) is an international, interdisciplinary journal that publishes articles on original research data in order to promote the reuse of high-quality data in the field of Earth system sciences. The journal welcomes submissions of original data or data collections that meet the required quality standards and have the potential to contribute to the goals of the journal. It includes sections dedicated to regular-length articles, brief communications (such as updates to existing data sets), commentaries, review articles, and special issues. ESSD is abstracted and indexed in several databases, including Science Citation Index Expanded, Current Contents/PCE, Scopus, ADS, CLOCKSS, CNKI, DOAJ, EBSCO, Gale/Cengage, GoOA (CAS), and Google Scholar, among others.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信