Oliver W. White, Andie Hall, Ben W. Price, Suzanne T. Williams, Matthew D. Clark
{"title":"用于从博物馆藏品基因组扦取线粒体基因组和核糖体基因进行批量组装、注释和系统发育分析的 Snakemake 工具包。","authors":"Oliver W. White, Andie Hall, Ben W. Price, Suzanne T. Williams, Matthew D. Clark","doi":"10.1111/1755-0998.14036","DOIUrl":null,"url":null,"abstract":"<p>Low coverage ‘genome-skims’ are often used to assemble organelle genomes and ribosomal gene sequences for cost-effective phylogenetic and barcoding studies. Natural history collections hold invaluable biological information, yet poor preservation resulting in degraded DNA often hinders polymerase chain reaction-based analyses. However, it is possible to generate libraries and sequence the short fragments typical of degraded DNA to generate genome-skims from museum collections. Here we introduce a snakemake toolkit comprised of three pipelines <i>skim2mito</i>, <i>skim2rrna</i> and <i>gene2phylo</i>, designed to unlock the genomic potential of historical museum specimens using genome skimming. Specifically, <i>skim2mito</i> and <i>skim2rrna</i> perform the batch assembly, annotation and phylogenetic analysis of mitochondrial genomes and nuclear ribosomal genes, respectively, from low-coverage genome skims. The third pipeline <i>gene2phylo</i> takes a set of gene alignments and performs phylogenetic analysis of individual genes, partitioned analysis of concatenated alignments and a phylogenetic analysis based on gene trees. We benchmark our pipelines with simulated data, followed by testing with a novel genome skimming dataset from both recent and historical solariellid gastropod samples. We show that the toolkit can recover mitochondrial and ribosomal genes from poorly preserved museum specimens of the gastropod family Solariellidae, and the phylogenetic analysis is consistent with our current understanding of taxonomic relationships. The generation of bioinformatic pipelines that facilitate processing large quantities of sequence data from the vast repository of specimens held in natural history museum collections will greatly aid species discovery and exploration of biodiversity over time, ultimately aiding conservation efforts in the face of a changing planet.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 1","pages":""},"PeriodicalIF":5.5000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.14036","citationCount":"0","resultStr":"{\"title\":\"A Snakemake Toolkit for the Batch Assembly, Annotation and Phylogenetic Analysis of Mitochondrial Genomes and Ribosomal Genes From Genome Skims of Museum Collections\",\"authors\":\"Oliver W. White, Andie Hall, Ben W. Price, Suzanne T. Williams, Matthew D. Clark\",\"doi\":\"10.1111/1755-0998.14036\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Low coverage ‘genome-skims’ are often used to assemble organelle genomes and ribosomal gene sequences for cost-effective phylogenetic and barcoding studies. Natural history collections hold invaluable biological information, yet poor preservation resulting in degraded DNA often hinders polymerase chain reaction-based analyses. However, it is possible to generate libraries and sequence the short fragments typical of degraded DNA to generate genome-skims from museum collections. Here we introduce a snakemake toolkit comprised of three pipelines <i>skim2mito</i>, <i>skim2rrna</i> and <i>gene2phylo</i>, designed to unlock the genomic potential of historical museum specimens using genome skimming. Specifically, <i>skim2mito</i> and <i>skim2rrna</i> perform the batch assembly, annotation and phylogenetic analysis of mitochondrial genomes and nuclear ribosomal genes, respectively, from low-coverage genome skims. The third pipeline <i>gene2phylo</i> takes a set of gene alignments and performs phylogenetic analysis of individual genes, partitioned analysis of concatenated alignments and a phylogenetic analysis based on gene trees. We benchmark our pipelines with simulated data, followed by testing with a novel genome skimming dataset from both recent and historical solariellid gastropod samples. We show that the toolkit can recover mitochondrial and ribosomal genes from poorly preserved museum specimens of the gastropod family Solariellidae, and the phylogenetic analysis is consistent with our current understanding of taxonomic relationships. The generation of bioinformatic pipelines that facilitate processing large quantities of sequence data from the vast repository of specimens held in natural history museum collections will greatly aid species discovery and exploration of biodiversity over time, ultimately aiding conservation efforts in the face of a changing planet.</p>\",\"PeriodicalId\":211,\"journal\":{\"name\":\"Molecular Ecology Resources\",\"volume\":\"25 1\",\"pages\":\"\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2024-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.14036\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Ecology Resources\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/1755-0998.14036\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Ecology Resources","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/1755-0998.14036","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
低覆盖率的 "基因组基线 "通常用于组装细胞器基因组和核糖体基因序列,以进行经济有效的系统发育和条形码研究。自然历史藏品蕴藏着宝贵的生物信息,但由于保存不善导致 DNA 降解,往往会阻碍基于聚合酶链反应的分析。不过,可以生成文库,并对典型的降解 DNA 短片段进行测序,从而从博物馆藏品中生成基因组片段。在这里,我们介绍一个由 skim2mito、skim2rrna 和 gene2phylo 三个管道组成的 snakemake 工具包,旨在利用基因组撇取技术发掘博物馆历史标本的基因组潜力。具体来说,skim2mito 和 skim2rrna 分别从低覆盖率的基因组标本中对线粒体基因组和核核糖体基因进行批量组装、注释和系统发育分析。第三个管道 gene2phylo 利用一组基因排列,对单个基因进行系统发育分析,对连接排列进行分区分析,并基于基因树进行系统发育分析。我们先用模拟数据对我们的管道进行基准测试,然后再用一个新的基因组撇取数据集进行测试,该数据集来自近期和历史上的腹足纲动物样本。我们的结果表明,该工具包可以从腹足纲腹足目保存较差的博物馆标本中恢复线粒体和核糖体基因,而且系统发育分析符合我们目前对分类关系的理解。从自然历史博物馆收藏的大量标本中生成生物信息学管道,以便于处理大量序列数据,这将极大地有助于物种发现和生物多样性的长期探索,最终有助于面对不断变化的地球的保护工作。
A Snakemake Toolkit for the Batch Assembly, Annotation and Phylogenetic Analysis of Mitochondrial Genomes and Ribosomal Genes From Genome Skims of Museum Collections
Low coverage ‘genome-skims’ are often used to assemble organelle genomes and ribosomal gene sequences for cost-effective phylogenetic and barcoding studies. Natural history collections hold invaluable biological information, yet poor preservation resulting in degraded DNA often hinders polymerase chain reaction-based analyses. However, it is possible to generate libraries and sequence the short fragments typical of degraded DNA to generate genome-skims from museum collections. Here we introduce a snakemake toolkit comprised of three pipelines skim2mito, skim2rrna and gene2phylo, designed to unlock the genomic potential of historical museum specimens using genome skimming. Specifically, skim2mito and skim2rrna perform the batch assembly, annotation and phylogenetic analysis of mitochondrial genomes and nuclear ribosomal genes, respectively, from low-coverage genome skims. The third pipeline gene2phylo takes a set of gene alignments and performs phylogenetic analysis of individual genes, partitioned analysis of concatenated alignments and a phylogenetic analysis based on gene trees. We benchmark our pipelines with simulated data, followed by testing with a novel genome skimming dataset from both recent and historical solariellid gastropod samples. We show that the toolkit can recover mitochondrial and ribosomal genes from poorly preserved museum specimens of the gastropod family Solariellidae, and the phylogenetic analysis is consistent with our current understanding of taxonomic relationships. The generation of bioinformatic pipelines that facilitate processing large quantities of sequence data from the vast repository of specimens held in natural history museum collections will greatly aid species discovery and exploration of biodiversity over time, ultimately aiding conservation efforts in the face of a changing planet.
期刊介绍:
Molecular Ecology Resources promotes the creation of comprehensive resources for the scientific community, encompassing computer programs, statistical and molecular advancements, and a diverse array of molecular tools. Serving as a conduit for disseminating these resources, the journal targets a broad audience of researchers in the fields of evolution, ecology, and conservation. Articles in Molecular Ecology Resources are crafted to support investigations tackling significant questions within these disciplines.
In addition to original resource articles, Molecular Ecology Resources features Reviews, Opinions, and Comments relevant to the field. The journal also periodically releases Special Issues focusing on resource development within specific areas.