{"title":"利用血统信息和参考基因型图谱进行全基因组序列改良,在苹果外交中得到验证","authors":"Stijn Vanderzande, Cameron Peace, Eric van de Weg","doi":"10.1101/2024.08.08.607141","DOIUrl":null,"url":null,"abstract":"Understanding the quality of a whole genome sequence (WGS) is important for its further use. Most WGS quality evaluations are based on bioinformatic quality metrics such as the N50 score, BUSCO score, and number of contigs and scaffolds present, yet genetic information considering principles of inheritance could be used to evaluate and improve assembly and phasing. Furthermore, WGS and genome resequencing data of related individuals could provide useful information when large chromosomal segments are shared with the target individual through common ancestry. Here, we show how high-quality, phased, genome-wide genotypic information is useful to evaluate the quality of a WGS. We provide an R-tool to routinely conduct such quality evaluations. The script also provides a method to accurately determine the WGS positions of reference SNP markers, which is needed for integration of SNP array-based genotypic data sets with WGS data, and the identification and comparison of segments across WGSs that are shared by descent. Finally, we provide suggestions on how such sharing can be used to evaluate and improve new WGSs. The approach is demonstrated in apple, for which improvements in WGS quality are evident from the first collapsed WGS with many inconsistencies in genetic marker order and genotype scores, through well-assembled haploid WGSs, to incorrectly and correctly phased diploid WGSs. This study shows that homozygous regions might need extra attention in phased WGSs and that further improvements to phased WGSs can be achieved by grouping chromosomes of single parental origin into the same haplome.","PeriodicalId":501246,"journal":{"name":"bioRxiv - Genetics","volume":"31 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Whole genome sequence improvement with pedigree information and reference genotypic profiles, demonstrated in outcrossing apple\",\"authors\":\"Stijn Vanderzande, Cameron Peace, Eric van de Weg\",\"doi\":\"10.1101/2024.08.08.607141\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Understanding the quality of a whole genome sequence (WGS) is important for its further use. Most WGS quality evaluations are based on bioinformatic quality metrics such as the N50 score, BUSCO score, and number of contigs and scaffolds present, yet genetic information considering principles of inheritance could be used to evaluate and improve assembly and phasing. Furthermore, WGS and genome resequencing data of related individuals could provide useful information when large chromosomal segments are shared with the target individual through common ancestry. Here, we show how high-quality, phased, genome-wide genotypic information is useful to evaluate the quality of a WGS. We provide an R-tool to routinely conduct such quality evaluations. The script also provides a method to accurately determine the WGS positions of reference SNP markers, which is needed for integration of SNP array-based genotypic data sets with WGS data, and the identification and comparison of segments across WGSs that are shared by descent. Finally, we provide suggestions on how such sharing can be used to evaluate and improve new WGSs. The approach is demonstrated in apple, for which improvements in WGS quality are evident from the first collapsed WGS with many inconsistencies in genetic marker order and genotype scores, through well-assembled haploid WGSs, to incorrectly and correctly phased diploid WGSs. This study shows that homozygous regions might need extra attention in phased WGSs and that further improvements to phased WGSs can be achieved by grouping chromosomes of single parental origin into the same haplome.\",\"PeriodicalId\":501246,\"journal\":{\"name\":\"bioRxiv - Genetics\",\"volume\":\"31 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"bioRxiv - Genetics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2024.08.08.607141\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Genetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.08.607141","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Whole genome sequence improvement with pedigree information and reference genotypic profiles, demonstrated in outcrossing apple
Understanding the quality of a whole genome sequence (WGS) is important for its further use. Most WGS quality evaluations are based on bioinformatic quality metrics such as the N50 score, BUSCO score, and number of contigs and scaffolds present, yet genetic information considering principles of inheritance could be used to evaluate and improve assembly and phasing. Furthermore, WGS and genome resequencing data of related individuals could provide useful information when large chromosomal segments are shared with the target individual through common ancestry. Here, we show how high-quality, phased, genome-wide genotypic information is useful to evaluate the quality of a WGS. We provide an R-tool to routinely conduct such quality evaluations. The script also provides a method to accurately determine the WGS positions of reference SNP markers, which is needed for integration of SNP array-based genotypic data sets with WGS data, and the identification and comparison of segments across WGSs that are shared by descent. Finally, we provide suggestions on how such sharing can be used to evaluate and improve new WGSs. The approach is demonstrated in apple, for which improvements in WGS quality are evident from the first collapsed WGS with many inconsistencies in genetic marker order and genotype scores, through well-assembled haploid WGSs, to incorrectly and correctly phased diploid WGSs. This study shows that homozygous regions might need extra attention in phased WGSs and that further improvements to phased WGSs can be achieved by grouping chromosomes of single parental origin into the same haplome.