Akito Y Kawahara, Caroline G Storer, Amanda Markee, Jacqueline Heckenhauer, Ashlyn Powell, David Plotkin, Scott Hotaling, Timothy P Cleland, Rebecca B Dikow, Torsten Dikow, Ryoichi B Kuranishi, Rebeccah Messcher, Steffen U Pauls, Russell J Stewart, Koji Tojo, Paul B Frandsen
{"title":"长读数 HiFi 测序正确组装出新飞蛾和笛蝇基因组中的重复重纤维蛋白丝基因。","authors":"Akito Y Kawahara, Caroline G Storer, Amanda Markee, Jacqueline Heckenhauer, Ashlyn Powell, David Plotkin, Scott Hotaling, Timothy P Cleland, Rebecca B Dikow, Torsten Dikow, Ryoichi B Kuranishi, Rebeccah Messcher, Steffen U Pauls, Russell J Stewart, Koji Tojo, Paul B Frandsen","doi":"10.46471/gigabyte.64","DOIUrl":null,"url":null,"abstract":"<p><p>Insect silk is a versatile biomaterial. Lepidoptera and Trichoptera display some of the most diverse uses of silk, with varying strength, adhesive qualities, and elastic properties. Silk fibroin genes are long (>20 Kbp), with many repetitive motifs that make them challenging to sequence. Most research thus far has focused on conserved N- and C-terminal regions of fibroin genes because a full comparison of repetitive regions across taxa has not been possible. Using the PacBio Sequel II system and SMRT sequencing, we generated high fidelity (HiFi) long-read genomic and transcriptomic sequences for the Indianmeal moth (<i>Plodia interpunctella</i>) and genomic sequences for the caddisfly <i>Eubasilissa regina</i>. Both genomes were highly contiguous (N50 = 9.7 Mbp/32.4 Mbp, L50 = 13/11) and complete (BUSCO complete = 99.3%/95.2%), with complete and contiguous recovery of silk <i>heavy fibroin</i> gene sequences. We show that HiFi long-read sequencing is helpful for understanding genes with long, repetitive regions.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2022 ","pages":"gigabyte64"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9693786/pdf/","citationCount":"0","resultStr":"{\"title\":\"Long-read HiFi sequencing correctly assembles repetitive <i>heavy fibroin</i> silk genes in new moth and caddisfly genomes.\",\"authors\":\"Akito Y Kawahara, Caroline G Storer, Amanda Markee, Jacqueline Heckenhauer, Ashlyn Powell, David Plotkin, Scott Hotaling, Timothy P Cleland, Rebecca B Dikow, Torsten Dikow, Ryoichi B Kuranishi, Rebeccah Messcher, Steffen U Pauls, Russell J Stewart, Koji Tojo, Paul B Frandsen\",\"doi\":\"10.46471/gigabyte.64\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Insect silk is a versatile biomaterial. Lepidoptera and Trichoptera display some of the most diverse uses of silk, with varying strength, adhesive qualities, and elastic properties. Silk fibroin genes are long (>20 Kbp), with many repetitive motifs that make them challenging to sequence. Most research thus far has focused on conserved N- and C-terminal regions of fibroin genes because a full comparison of repetitive regions across taxa has not been possible. Using the PacBio Sequel II system and SMRT sequencing, we generated high fidelity (HiFi) long-read genomic and transcriptomic sequences for the Indianmeal moth (<i>Plodia interpunctella</i>) and genomic sequences for the caddisfly <i>Eubasilissa regina</i>. Both genomes were highly contiguous (N50 = 9.7 Mbp/32.4 Mbp, L50 = 13/11) and complete (BUSCO complete = 99.3%/95.2%), with complete and contiguous recovery of silk <i>heavy fibroin</i> gene sequences. We show that HiFi long-read sequencing is helpful for understanding genes with long, repetitive regions.</p>\",\"PeriodicalId\":73157,\"journal\":{\"name\":\"GigaByte (Hong Kong, China)\",\"volume\":\"2022 \",\"pages\":\"gigabyte64\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9693786/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"GigaByte (Hong Kong, China)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.46471/gigabyte.64\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2022/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaByte (Hong Kong, China)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46471/gigabyte.64","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
昆虫丝是一种用途广泛的生物材料。鳞翅目和毛翅目昆虫展示了蚕丝最多样化的用途,具有不同的强度、粘附性和弹性特性。蚕丝纤维素基因很长(>20 Kbp),有许多重复图案,这使它们的测序具有挑战性。迄今为止,大多数研究都集中在纤维蛋白基因的 N 端和 C 端保守区域,因为还不可能对不同类群的重复区域进行全面比较。利用 PacBio Sequel II 系统和 SMRT 测序技术,我们生成了印度小卷蛾(Plodia interpunctella)的高保真(HiFi)长序列基因组和转录组序列,以及笛蛉 Eubasilissa regina 的基因组序列。这两个基因组高度连续(N50 = 9.7 Mbp/32.4 Mbp,L50 = 13/11)和完整(BUSCO complete = 99.3%/95.2%),并完整连续地恢复了蚕丝重纤维蛋白基因序列。我们的研究表明,HiFi 长线程测序有助于了解具有长重复区域的基因。
Long-read HiFi sequencing correctly assembles repetitive heavy fibroin silk genes in new moth and caddisfly genomes.
Insect silk is a versatile biomaterial. Lepidoptera and Trichoptera display some of the most diverse uses of silk, with varying strength, adhesive qualities, and elastic properties. Silk fibroin genes are long (>20 Kbp), with many repetitive motifs that make them challenging to sequence. Most research thus far has focused on conserved N- and C-terminal regions of fibroin genes because a full comparison of repetitive regions across taxa has not been possible. Using the PacBio Sequel II system and SMRT sequencing, we generated high fidelity (HiFi) long-read genomic and transcriptomic sequences for the Indianmeal moth (Plodia interpunctella) and genomic sequences for the caddisfly Eubasilissa regina. Both genomes were highly contiguous (N50 = 9.7 Mbp/32.4 Mbp, L50 = 13/11) and complete (BUSCO complete = 99.3%/95.2%), with complete and contiguous recovery of silk heavy fibroin gene sequences. We show that HiFi long-read sequencing is helpful for understanding genes with long, repetitive regions.