代表 6 个目(昆虫纲:鞘翅目、双翅目、半翅目、膜翅目、鳞翅目、神经翅目)的 9 个非模式北美昆虫物种的高质量基因组组装。

IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Kimberly K. O. Walden, Yanghui Cao, Christopher J. Fields, Alvaro G. Hernandez, Gloria A. Rendon, Gene E. Robinson, Rachel K. Skinner, Jeffrey A. Stein, Christopher H. Dietrich
{"title":"代表 6 个目(昆虫纲:鞘翅目、双翅目、半翅目、膜翅目、鳞翅目、神经翅目)的 9 个非模式北美昆虫物种的高质量基因组组装。","authors":"Kimberly K. O. Walden,&nbsp;Yanghui Cao,&nbsp;Christopher J. Fields,&nbsp;Alvaro G. Hernandez,&nbsp;Gloria A. Rendon,&nbsp;Gene E. Robinson,&nbsp;Rachel K. Skinner,&nbsp;Jeffrey A. Stein,&nbsp;Christopher H. Dietrich","doi":"10.1111/1755-0998.14010","DOIUrl":null,"url":null,"abstract":"<p>Field-collected specimens were used to obtain nine high-quality genome assemblies from a total of 10 insect species native to prairies and savannas of central Illinois (USA): <i>Mellilla xanthometata</i> (Lepidoptera: Geometridae), <i>Stenolophus ochropezus</i> (Coleoptera: Carabidae), <i>Forcipata loca</i> (Hemiptera: Cicadellidae), <i>Coelinius</i> sp. (Hymenoptera: Braconidae), <i>Thaumatomyia glabra</i> (Diptera: Chloropidae), <i>Brachynemurus abdominalus</i> (Neuroptera: Myrmeleontidae), <i>Catonia carolina</i> (Hemiptera: Achilidae), <i>Oncometopia orbona</i> (Hemiptera: Cicadellidae), <i>Flexamia atlantica</i> (Hemiptera: Cicadellidae) and <i>Stictocephala bisonia</i> (Hemiptera: Membracidae). Sequencing library preparation from single specimens was successful despite extremely small DNA yields (&lt;0.1 μg) for some samples. Additional sequencing and assembly workflows were adapted to each sample depending on the initial DNA yield. PacBio circular consensus (CCS/HiFi) or continuous long reads (CLR) libraries were used to sequence DNA fragments up to 50 kb in length, with Illumina sequenced linked-reads (TellSeq libraries) and Omni-C libraries used for scaffolding and gap-filling. Assembled genome sizes ranged from 135 MB to 3.2 GB. The number of assembled scaffolds ranged from 47 to &gt;13,000, with the longest scaffold per assembly ranging from ~23 to 439 Mb. Genome completeness was high, with BUSCO scores ranging from 85.5% completeness for the largest genome (<i>Stictocephala bisonia</i>) to 98.8% completeness for the smallest genome (<i>Coelinius</i> sp.). The unique content was estimated using RepeatMasker and GenomeScope2, which ranged from 50.7% to 75.8% and roughly decreased with increasing genome size. Structural annotation predicted a range of 19,281–72,469 protein models for sequenced species. Sequencing costs per genome at the time ranged from US$3–5k, averaged ~1600 CPU-hours on a high-performance cluster and required approximately 14 h of bioinformatics analyses with samples using PacBio HiFi data. Most assemblies would benefit from further manual curation to correct possible scaffold misjoins and translocations suggested by off-diagonal or depleted signals in Omni-C contact maps.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"24 8","pages":""},"PeriodicalIF":5.5000,"publicationDate":"2024-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.14010","citationCount":"0","resultStr":"{\"title\":\"High-quality genome assemblies for nine non-model North American insect species representing six orders (Insecta: Coleoptera, Diptera, Hemiptera, Hymenoptera, Lepidoptera, Neuroptera)\",\"authors\":\"Kimberly K. O. Walden,&nbsp;Yanghui Cao,&nbsp;Christopher J. Fields,&nbsp;Alvaro G. Hernandez,&nbsp;Gloria A. Rendon,&nbsp;Gene E. Robinson,&nbsp;Rachel K. Skinner,&nbsp;Jeffrey A. Stein,&nbsp;Christopher H. Dietrich\",\"doi\":\"10.1111/1755-0998.14010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Field-collected specimens were used to obtain nine high-quality genome assemblies from a total of 10 insect species native to prairies and savannas of central Illinois (USA): <i>Mellilla xanthometata</i> (Lepidoptera: Geometridae), <i>Stenolophus ochropezus</i> (Coleoptera: Carabidae), <i>Forcipata loca</i> (Hemiptera: Cicadellidae), <i>Coelinius</i> sp. (Hymenoptera: Braconidae), <i>Thaumatomyia glabra</i> (Diptera: Chloropidae), <i>Brachynemurus abdominalus</i> (Neuroptera: Myrmeleontidae), <i>Catonia carolina</i> (Hemiptera: Achilidae), <i>Oncometopia orbona</i> (Hemiptera: Cicadellidae), <i>Flexamia atlantica</i> (Hemiptera: Cicadellidae) and <i>Stictocephala bisonia</i> (Hemiptera: Membracidae). Sequencing library preparation from single specimens was successful despite extremely small DNA yields (&lt;0.1 μg) for some samples. Additional sequencing and assembly workflows were adapted to each sample depending on the initial DNA yield. PacBio circular consensus (CCS/HiFi) or continuous long reads (CLR) libraries were used to sequence DNA fragments up to 50 kb in length, with Illumina sequenced linked-reads (TellSeq libraries) and Omni-C libraries used for scaffolding and gap-filling. Assembled genome sizes ranged from 135 MB to 3.2 GB. The number of assembled scaffolds ranged from 47 to &gt;13,000, with the longest scaffold per assembly ranging from ~23 to 439 Mb. Genome completeness was high, with BUSCO scores ranging from 85.5% completeness for the largest genome (<i>Stictocephala bisonia</i>) to 98.8% completeness for the smallest genome (<i>Coelinius</i> sp.). The unique content was estimated using RepeatMasker and GenomeScope2, which ranged from 50.7% to 75.8% and roughly decreased with increasing genome size. Structural annotation predicted a range of 19,281–72,469 protein models for sequenced species. Sequencing costs per genome at the time ranged from US$3–5k, averaged ~1600 CPU-hours on a high-performance cluster and required approximately 14 h of bioinformatics analyses with samples using PacBio HiFi data. Most assemblies would benefit from further manual curation to correct possible scaffold misjoins and translocations suggested by off-diagonal or depleted signals in Omni-C contact maps.</p>\",\"PeriodicalId\":211,\"journal\":{\"name\":\"Molecular Ecology Resources\",\"volume\":\"24 8\",\"pages\":\"\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2024-08-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.14010\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Ecology Resources\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/1755-0998.14010\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Ecology Resources","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/1755-0998.14010","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

利用野外采集的标本从原产于美国伊利诺伊州中部草原和热带草原的共 10 种昆虫中获得了 9 个高质量的基因组组装:Mellilla xanthometata(鳞翅目:尺蠖科)、Stenolophus ochropezus(鞘翅目:螨科)、Forcipata loca(半翅目:蝉科)、Coelinius sp.(Hymenoptera: Braconidae), Thaumatomyia glabra (Diptera: Chloropidae), Brachynemurus abdominalus (Neuroptera: Myrmeleontidae), Catonia carolina (Hemiptera. Achilidae):Achilidae), Oncometopia orbona (Hemiptera: Cicadellidae), Flexamia atlantica (Hemiptera: Cicadellidae) and Stictocephala bisonia (Hemiptera: Membracidae).尽管 DNA 产量极低(13,000 个,每个组装的最长支架从 ~23 到 439 Mb 不等,但从单个标本中成功制备了测序文库。基因组的完整性很高,BUSCO评分从最大基因组(Stictocephala bisonia)的85.5%到最小基因组(Coelinius sp.)的98.8%不等。使用 RepeatMasker 和 GenomeScope2 对唯一性含量进行了估计,其范围为 50.7% 到 75.8%,随着基因组大小的增加,唯一性含量大致下降。结构注释预测测序物种的蛋白质模型为 19,281-72,469 个。当时每个基因组的测序成本在 3-5 千美元之间,在高性能集群上平均约为 1600 个 CPU 小时,使用 PacBio HiFi 数据对样本进行生物信息学分析约需 14 个小时。通过进一步的手工整理,纠正 Omni-C 接触图中偏离对角线或耗尽的信号所显示的可能的支架错接和易位,大多数组装结果都将从中受益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

High-quality genome assemblies for nine non-model North American insect species representing six orders (Insecta: Coleoptera, Diptera, Hemiptera, Hymenoptera, Lepidoptera, Neuroptera)

High-quality genome assemblies for nine non-model North American insect species representing six orders (Insecta: Coleoptera, Diptera, Hemiptera, Hymenoptera, Lepidoptera, Neuroptera)

Field-collected specimens were used to obtain nine high-quality genome assemblies from a total of 10 insect species native to prairies and savannas of central Illinois (USA): Mellilla xanthometata (Lepidoptera: Geometridae), Stenolophus ochropezus (Coleoptera: Carabidae), Forcipata loca (Hemiptera: Cicadellidae), Coelinius sp. (Hymenoptera: Braconidae), Thaumatomyia glabra (Diptera: Chloropidae), Brachynemurus abdominalus (Neuroptera: Myrmeleontidae), Catonia carolina (Hemiptera: Achilidae), Oncometopia orbona (Hemiptera: Cicadellidae), Flexamia atlantica (Hemiptera: Cicadellidae) and Stictocephala bisonia (Hemiptera: Membracidae). Sequencing library preparation from single specimens was successful despite extremely small DNA yields (<0.1 μg) for some samples. Additional sequencing and assembly workflows were adapted to each sample depending on the initial DNA yield. PacBio circular consensus (CCS/HiFi) or continuous long reads (CLR) libraries were used to sequence DNA fragments up to 50 kb in length, with Illumina sequenced linked-reads (TellSeq libraries) and Omni-C libraries used for scaffolding and gap-filling. Assembled genome sizes ranged from 135 MB to 3.2 GB. The number of assembled scaffolds ranged from 47 to >13,000, with the longest scaffold per assembly ranging from ~23 to 439 Mb. Genome completeness was high, with BUSCO scores ranging from 85.5% completeness for the largest genome (Stictocephala bisonia) to 98.8% completeness for the smallest genome (Coelinius sp.). The unique content was estimated using RepeatMasker and GenomeScope2, which ranged from 50.7% to 75.8% and roughly decreased with increasing genome size. Structural annotation predicted a range of 19,281–72,469 protein models for sequenced species. Sequencing costs per genome at the time ranged from US$3–5k, averaged ~1600 CPU-hours on a high-performance cluster and required approximately 14 h of bioinformatics analyses with samples using PacBio HiFi data. Most assemblies would benefit from further manual curation to correct possible scaffold misjoins and translocations suggested by off-diagonal or depleted signals in Omni-C contact maps.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Molecular Ecology Resources
Molecular Ecology Resources 生物-进化生物学
CiteScore
15.60
自引率
5.20%
发文量
170
审稿时长
3 months
期刊介绍: Molecular Ecology Resources promotes the creation of comprehensive resources for the scientific community, encompassing computer programs, statistical and molecular advancements, and a diverse array of molecular tools. Serving as a conduit for disseminating these resources, the journal targets a broad audience of researchers in the fields of evolution, ecology, and conservation. Articles in Molecular Ecology Resources are crafted to support investigations tackling significant questions within these disciplines. In addition to original resource articles, Molecular Ecology Resources features Reviews, Opinions, and Comments relevant to the field. The journal also periodically releases Special Issues focusing on resource development within specific areas.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信