利用血统信息和参考基因型图谱进行全基因组序列改良,在苹果外交中得到验证

Stijn Vanderzande, Cameron Peace, Eric van de Weg
{"title":"利用血统信息和参考基因型图谱进行全基因组序列改良,在苹果外交中得到验证","authors":"Stijn Vanderzande, Cameron Peace, Eric van de Weg","doi":"10.1101/2024.08.08.607141","DOIUrl":null,"url":null,"abstract":"Understanding the quality of a whole genome sequence (WGS) is important for its further use. Most WGS quality evaluations are based on bioinformatic quality metrics such as the N50 score, BUSCO score, and number of contigs and scaffolds present, yet genetic information considering principles of inheritance could be used to evaluate and improve assembly and phasing. Furthermore, WGS and genome resequencing data of related individuals could provide useful information when large chromosomal segments are shared with the target individual through common ancestry. Here, we show how high-quality, phased, genome-wide genotypic information is useful to evaluate the quality of a WGS. We provide an R-tool to routinely conduct such quality evaluations. The script also provides a method to accurately determine the WGS positions of reference SNP markers, which is needed for integration of SNP array-based genotypic data sets with WGS data, and the identification and comparison of segments across WGSs that are shared by descent. Finally, we provide suggestions on how such sharing can be used to evaluate and improve new WGSs. The approach is demonstrated in apple, for which improvements in WGS quality are evident from the first collapsed WGS with many inconsistencies in genetic marker order and genotype scores, through well-assembled haploid WGSs, to incorrectly and correctly phased diploid WGSs. This study shows that homozygous regions might need extra attention in phased WGSs and that further improvements to phased WGSs can be achieved by grouping chromosomes of single parental origin into the same haplome.","PeriodicalId":501246,"journal":{"name":"bioRxiv - Genetics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Whole genome sequence improvement with pedigree information and reference genotypic profiles, demonstrated in outcrossing apple\",\"authors\":\"Stijn Vanderzande, Cameron Peace, Eric van de Weg\",\"doi\":\"10.1101/2024.08.08.607141\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Understanding the quality of a whole genome sequence (WGS) is important for its further use. Most WGS quality evaluations are based on bioinformatic quality metrics such as the N50 score, BUSCO score, and number of contigs and scaffolds present, yet genetic information considering principles of inheritance could be used to evaluate and improve assembly and phasing. Furthermore, WGS and genome resequencing data of related individuals could provide useful information when large chromosomal segments are shared with the target individual through common ancestry. Here, we show how high-quality, phased, genome-wide genotypic information is useful to evaluate the quality of a WGS. We provide an R-tool to routinely conduct such quality evaluations. The script also provides a method to accurately determine the WGS positions of reference SNP markers, which is needed for integration of SNP array-based genotypic data sets with WGS data, and the identification and comparison of segments across WGSs that are shared by descent. Finally, we provide suggestions on how such sharing can be used to evaluate and improve new WGSs. The approach is demonstrated in apple, for which improvements in WGS quality are evident from the first collapsed WGS with many inconsistencies in genetic marker order and genotype scores, through well-assembled haploid WGSs, to incorrectly and correctly phased diploid WGSs. This study shows that homozygous regions might need extra attention in phased WGSs and that further improvements to phased WGSs can be achieved by grouping chromosomes of single parental origin into the same haplome.\",\"PeriodicalId\":501246,\"journal\":{\"name\":\"bioRxiv - Genetics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"bioRxiv - Genetics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2024.08.08.607141\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Genetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.08.607141","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

了解全基因组序列(WGS)的质量对其进一步使用非常重要。大多数 WGS 质量评估都是基于生物信息质量指标,如 N50 分数、BUSCO 分数以及等位基因和支架的数量,但考虑到遗传原理的遗传信息也可用于评估和改进组装和分期。此外,当目标个体通过共同祖先共享大的染色体片段时,相关个体的 WGS 和基因组重测序数据也能提供有用的信息。在这里,我们展示了高质量、分期的全基因组基因型信息如何有助于评估 WGS 的质量。我们提供了一个 R 工具,用于常规进行此类质量评估。该脚本还提供了一种准确确定参考 SNP 标记的 WGS 位置的方法,这对于基于 SNP 阵列的基因型数据集与 WGS 数据的整合,以及识别和比较各 WGS 之间共享的世系片段都是必需的。最后,我们就如何利用这种共享来评估和改进新的 WGS 提出了建议。我们在苹果中演示了这种方法,从遗传标记顺序和基因型得分存在许多不一致之处的第一个折叠 WGS,到组装良好的单倍体 WGS,再到分期错误和分期正确的二倍体 WGS,WGS 质量的提高是显而易见的。这项研究表明,在分阶段 WGS 中可能需要格外注意同源区,将单亲来源的染色体归入同一单倍体可进一步改进分阶段 WGS。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Whole genome sequence improvement with pedigree information and reference genotypic profiles, demonstrated in outcrossing apple
Understanding the quality of a whole genome sequence (WGS) is important for its further use. Most WGS quality evaluations are based on bioinformatic quality metrics such as the N50 score, BUSCO score, and number of contigs and scaffolds present, yet genetic information considering principles of inheritance could be used to evaluate and improve assembly and phasing. Furthermore, WGS and genome resequencing data of related individuals could provide useful information when large chromosomal segments are shared with the target individual through common ancestry. Here, we show how high-quality, phased, genome-wide genotypic information is useful to evaluate the quality of a WGS. We provide an R-tool to routinely conduct such quality evaluations. The script also provides a method to accurately determine the WGS positions of reference SNP markers, which is needed for integration of SNP array-based genotypic data sets with WGS data, and the identification and comparison of segments across WGSs that are shared by descent. Finally, we provide suggestions on how such sharing can be used to evaluate and improve new WGSs. The approach is demonstrated in apple, for which improvements in WGS quality are evident from the first collapsed WGS with many inconsistencies in genetic marker order and genotype scores, through well-assembled haploid WGSs, to incorrectly and correctly phased diploid WGSs. This study shows that homozygous regions might need extra attention in phased WGSs and that further improvements to phased WGSs can be achieved by grouping chromosomes of single parental origin into the same haplome.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信