A haplotype-resolved reference genome for Eucalyptus grandis.

IF 2.1 3区 生物学 Q3 GENETICS & HEREDITY
Anneri Lötter, Tomas Bruna, Tuan A Duong, Kerrie Barry, Anna Lipzen, Chris Daum, Yuko Yoshinaga, Jane Grimwood, Jerry W Jenkins, Jayson Talag, Justin Borevitz, John T Lovell, Jeremy Schmutz, Jill L Wegrzyn, Alexander A Myburg
{"title":"A haplotype-resolved reference genome for Eucalyptus grandis.","authors":"Anneri Lötter, Tomas Bruna, Tuan A Duong, Kerrie Barry, Anna Lipzen, Chris Daum, Yuko Yoshinaga, Jane Grimwood, Jerry W Jenkins, Jayson Talag, Justin Borevitz, John T Lovell, Jeremy Schmutz, Jill L Wegrzyn, Alexander A Myburg","doi":"10.1093/g3journal/jkaf112","DOIUrl":null,"url":null,"abstract":"<p><p>E. grandis is a hardwood tree used worldwide as pure species or hybrid partner to breed fast-growing plantation forestry crops that serve as feedstocks of timber and lignocellulosic biomass for pulp, paper, biomaterials and biorefinery products. The current v2.0 genome reference for the species (Bartholome et al. 2015; Myburg et al. 2014) served as the first reference for the genus and has helped drive the development of molecular breeding tools for eucalypts. Using PacBio HiFi long reads and Omni-C proximity ligation sequencing, we produced an improved, haplotype-phased assembly (v4.0) for TAG0014, an early-generation selection of E. grandis. The two haplotypes are 571 Mbp (HAP1) and 552 Mbp (HAP2) in size and consist of 37 and 46 contigs scaffolded onto 11 chromosomes (contig N50 of 28.9 and 16.7 Mbp), respectively. These haplotype assemblies are 70 to 90 Mbp smaller than the diploid v2.0 assembly but capture all except one of the 22 telomeres, suggesting that substantial redundant sequence was included in the previous assembly. A total of 35,929 (HAP1) and 35,583 (HAP2) gene models were annotated, of which 438 and 472 contain long introns (>10 kbp) in gene models previously (v2.0) identified as multiple smaller genes. These and other improvements have increased gene annotation completeness levels from 93.8% to 99.4% in the v4.0 assembly. We found that 6,493 and 6,346 genes are within tandem duplicate arrays (HAP1 and HAP2, respectively, 18.4% and 17.8% of the total) and >43.8% of the haplotype assemblies consists of repeat elements. Analysis of synteny between the haplotypes and the E. grandis v2.0 reference genome revealed extensive regions of collinearity, but also some major rearrangements, and provided a preview of population and pan-genome variation in the species.</p>","PeriodicalId":12468,"journal":{"name":"G3: Genes|Genomes|Genetics","volume":" ","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"G3: Genes|Genomes|Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/g3journal/jkaf112","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

E. grandis is a hardwood tree used worldwide as pure species or hybrid partner to breed fast-growing plantation forestry crops that serve as feedstocks of timber and lignocellulosic biomass for pulp, paper, biomaterials and biorefinery products. The current v2.0 genome reference for the species (Bartholome et al. 2015; Myburg et al. 2014) served as the first reference for the genus and has helped drive the development of molecular breeding tools for eucalypts. Using PacBio HiFi long reads and Omni-C proximity ligation sequencing, we produced an improved, haplotype-phased assembly (v4.0) for TAG0014, an early-generation selection of E. grandis. The two haplotypes are 571 Mbp (HAP1) and 552 Mbp (HAP2) in size and consist of 37 and 46 contigs scaffolded onto 11 chromosomes (contig N50 of 28.9 and 16.7 Mbp), respectively. These haplotype assemblies are 70 to 90 Mbp smaller than the diploid v2.0 assembly but capture all except one of the 22 telomeres, suggesting that substantial redundant sequence was included in the previous assembly. A total of 35,929 (HAP1) and 35,583 (HAP2) gene models were annotated, of which 438 and 472 contain long introns (>10 kbp) in gene models previously (v2.0) identified as multiple smaller genes. These and other improvements have increased gene annotation completeness levels from 93.8% to 99.4% in the v4.0 assembly. We found that 6,493 and 6,346 genes are within tandem duplicate arrays (HAP1 and HAP2, respectively, 18.4% and 17.8% of the total) and >43.8% of the haplotype assemblies consists of repeat elements. Analysis of synteny between the haplotypes and the E. grandis v2.0 reference genome revealed extensive regions of collinearity, but also some major rearrangements, and provided a preview of population and pan-genome variation in the species.

巨桉单倍型解析参考基因组。
大叶木是一种硬木树,在世界范围内被用作纯种或杂交伙伴,培育快速生长的人工林作物,作为木材和木质纤维素生物质的原料,用于纸浆、纸张、生物材料和生物炼制产品。目前该物种的v2.0基因组参考(Bartholome et al. 2015;Myburg et al. 2014)作为该属的第一个参考,并帮助推动了桉树分子育种工具的发展。利用PacBio HiFi长读数和Omni-C近端连接测序,我们为TAG0014生产了一个改进的单倍型阶段组装(v4.0), TAG0014是E. grandis的早期选择。这两个单倍型的大小分别为571 Mbp (HAP1)和552 Mbp (HAP2),分别由37和46个contigs组成,分别位于11条染色体上(contigs N50分别为28.9和16.7 Mbp)。这些单倍型组装比二倍体2.0组装小70 - 90mbp,但捕获了22个端粒中除了一个之外的所有端粒,这表明在之前的组装中包含了大量的冗余序列。共有35,929个(HAP1)和35,583个(HAP2)基因模型被注释,其中438个和472个包含长内含子(>10 kbp),这些基因模型先前(v2.0)被鉴定为多个较小的基因。这些改进和其他改进使v4.0程序集的基因注释完整性水平从93.8%提高到99.4%。结果显示,序列重复序列基因数为6493个,序列重复序列基因数为6346个(HAP1和HAP2分别占总基因数的18.4%和17.8%),序列重复序列基因数为43.8%。单倍型与大叶鹭v2.0参考基因组的共线性分析揭示了大叶鹭广泛的共线性区域,但也有一些主要的重排,为该物种的种群和泛基因组变异提供了一个预览。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
G3: Genes|Genomes|Genetics
G3: Genes|Genomes|Genetics GENETICS & HEREDITY-
CiteScore
5.10
自引率
3.80%
发文量
305
审稿时长
3-8 weeks
期刊介绍: G3: Genes, Genomes, Genetics provides a forum for the publication of high‐quality foundational research, particularly research that generates useful genetic and genomic information such as genome maps, single gene studies, genome‐wide association and QTL studies, as well as genome reports, mutant screens, and advances in methods and technology. The Editorial Board of G3 believes that rapid dissemination of these data is the necessary foundation for analysis that leads to mechanistic insights. G3, published by the Genetics Society of America, meets the critical and growing need of the genetics community for rapid review and publication of important results in all areas of genetics. G3 offers the opportunity to publish the puzzling finding or to present unpublished results that may not have been submitted for review and publication due to a perceived lack of a potential high-impact finding. G3 has earned the DOAJ Seal, which is a mark of certification for open access journals, awarded by DOAJ to journals that achieve a high level of openness, adhere to Best Practice and high publishing standards.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信