{"title":"5","authors":"Bernardo Kucinski","doi":"10.7551/mitpress/11741.003.0008","DOIUrl":null,"url":null,"abstract":"32 High quality reference genomes are vital to study the impact of sequence variation on genome 33 structure and function. Recent advancements in long-read sequencing have greatly improved 34 the quality of de novo genome assemblies and enhanced the detection of sequence variants at 35 the scale of hundreds or thousands of bases. The nematode Caenorhabditis elegans is a 36 powerful model system for both genetic and evolutionary studies. Comparisons between two 37 diverged wild isolates, the Bristol and Hawaiian strains, have been widely utilized in the analysis 38 of small genetic structural variations in C. elegans . The reference genomes most widely used 39 for these isolates were assembled using short read sequencing, which makes the detection of 40 large structural variations challenging. To comprehensively detect both large and small 41 structural variations as well as sequence divergence in the Hawaiian and Bristol C. elegans 42 isolates, we generated de novo genome assemblies for each strain using both long- and short- 43 read sequencing. With these assemblies, we annotate over 3.1Mb of sequence divergence 44 between the Bristol and Hawaiian isolates: 337,584 SNPs, 94,503 small insertion-deletions 45 (<50bp), and 4,334 structural variations (>50bp). By comparing our de novo genome assembly 46 of the Bristol isolate to the VC2010 Bristol assembly, we also reveal that lab lineages display 47 1,162 SNPs, 1,528 indels, as well as 897 structural variations- over 2Mb of total variation. Our 48 work highlights both the importance of using long-read sequencing in de novo genome 49 assembly to identify the total genetic variation between strains and the underappreciated impact 50 of long-term laboratory cultivation on genome structure.","PeriodicalId":224723,"journal":{"name":"Tao te Ching","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Tao te Ching","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7551/mitpress/11741.003.0008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
32 High quality reference genomes are vital to study the impact of sequence variation on genome 33 structure and function. Recent advancements in long-read sequencing have greatly improved 34 the quality of de novo genome assemblies and enhanced the detection of sequence variants at 35 the scale of hundreds or thousands of bases. The nematode Caenorhabditis elegans is a 36 powerful model system for both genetic and evolutionary studies. Comparisons between two 37 diverged wild isolates, the Bristol and Hawaiian strains, have been widely utilized in the analysis 38 of small genetic structural variations in C. elegans . The reference genomes most widely used 39 for these isolates were assembled using short read sequencing, which makes the detection of 40 large structural variations challenging. To comprehensively detect both large and small 41 structural variations as well as sequence divergence in the Hawaiian and Bristol C. elegans 42 isolates, we generated de novo genome assemblies for each strain using both long- and short- 43 read sequencing. With these assemblies, we annotate over 3.1Mb of sequence divergence 44 between the Bristol and Hawaiian isolates: 337,584 SNPs, 94,503 small insertion-deletions 45 (<50bp), and 4,334 structural variations (>50bp). By comparing our de novo genome assembly 46 of the Bristol isolate to the VC2010 Bristol assembly, we also reveal that lab lineages display 47 1,162 SNPs, 1,528 indels, as well as 897 structural variations- over 2Mb of total variation. Our 48 work highlights both the importance of using long-read sequencing in de novo genome 49 assembly to identify the total genetic variation between strains and the underappreciated impact 50 of long-term laboratory cultivation on genome structure.