Long-read HiFi sequencing correctly assembles repetitive heavy fibroin silk genes in new moth and caddisfly genomes

A. Kawahara, Caroline G. Storer, A. Markee, J. Heckenhauer, A. Powell, David M. Plotkin, S. Hotaling, T. Cleland, Rebecca B. Dikow, Torsten Dikow, Ryoichi B. Kuranishi, Rebeccah L. Messcher, S. Pauls, R. Stewart, K. Tojo, P. Frandsen
{"title":"Long-read HiFi sequencing correctly assembles repetitive heavy fibroin silk genes in new moth and caddisfly genomes","authors":"A. Kawahara, Caroline G. Storer, A. Markee, J. Heckenhauer, A. Powell, David M. Plotkin, S. Hotaling, T. Cleland, Rebecca B. Dikow, Torsten Dikow, Ryoichi B. Kuranishi, Rebeccah L. Messcher, S. Pauls, R. Stewart, K. Tojo, P. Frandsen","doi":"10.1101/2022.06.01.494423","DOIUrl":null,"url":null,"abstract":"Insect silk is an incredibly versatile biomaterial. Lepidoptera and their sister lineage, Trichoptera, display some of the most diverse uses of silk with varying strength, adhesive qualities and elastic properties. It is well known that silk fibroin genes are long (> 20 kb) and have many repetitive motifs. These features make these genes challenging to sequence. Most research thus far has focused on conserved N- and C-terminal regions of fibroin genes because a full comparison of repetitive regions across taxa has not been possible. Using the PacBio Sequel II system and SMRT sequencing, we generated high fidelity (HiFi) long-read genomic and transcriptomic sequences for the Indianmeal moth (Plodia interpunctella) and genomic sequences for the caddisfly, Eubasilissa regina. Both genomes were highly contiguous (N50 = 9.7 Mbp/32.4 Mbp, L50 = 13/11) and complete (BUSCO Complete = 99.3%/95.2%), with complete and contiguous recovery of silk heavy fibroin gene sequences. This study demonstrates that HiFi long-read sequencing can significantly help our understanding of genes with highly contiguous, repetitive regions.","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaByte (Hong Kong, China)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2022.06.01.494423","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

Insect silk is an incredibly versatile biomaterial. Lepidoptera and their sister lineage, Trichoptera, display some of the most diverse uses of silk with varying strength, adhesive qualities and elastic properties. It is well known that silk fibroin genes are long (> 20 kb) and have many repetitive motifs. These features make these genes challenging to sequence. Most research thus far has focused on conserved N- and C-terminal regions of fibroin genes because a full comparison of repetitive regions across taxa has not been possible. Using the PacBio Sequel II system and SMRT sequencing, we generated high fidelity (HiFi) long-read genomic and transcriptomic sequences for the Indianmeal moth (Plodia interpunctella) and genomic sequences for the caddisfly, Eubasilissa regina. Both genomes were highly contiguous (N50 = 9.7 Mbp/32.4 Mbp, L50 = 13/11) and complete (BUSCO Complete = 99.3%/95.2%), with complete and contiguous recovery of silk heavy fibroin gene sequences. This study demonstrates that HiFi long-read sequencing can significantly help our understanding of genes with highly contiguous, repetitive regions.
长读HiFi测序正确组装重复重丝素基因在新的飞蛾和球蛾基因组
昆虫丝是一种用途广泛的生物材料。鳞翅目和它们的姐妹系——毛翅目,展示了一些最多样化的丝绸用途,它们具有不同的强度、粘接质量和弹性。众所周知,丝素蛋白基因很长(大约20 kb),并且有许多重复的基序。这些特征使得这些基因难以测序。到目前为止,大多数研究都集中在纤维蛋白基因的保守N端和c端区域,因为不可能对不同分类群的重复区域进行全面比较。利用PacBio Sequel II系统和SMRT测序,我们生成了印度蛾(Plodia interpunctella)和白蛉(Eubasilissa regina)的高保真(HiFi)长读基因组和转录组序列。两个基因组高度连续(N50 = 9.7 Mbp/32.4 Mbp, L50 = 13/11)和完整(BUSCO complete = 99.3%/95.2%),恢复的丝质重丝蛋白基因序列完整且连续。这项研究表明,HiFi长读测序可以显著地帮助我们理解具有高度连续、重复区域的基因。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.60
自引率
0.00%
发文量
0
审稿时长
5 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信