参考物种对参考基因组组装的影响

Juyeon Kim, Daehwan Lee, Mikang Sim, Jongin Lee, Jaebum Kim
{"title":"参考物种对参考基因组组装的影响","authors":"Juyeon Kim, Daehwan Lee, Mikang Sim, Jongin Lee, Jaebum Kim","doi":"10.1145/3156346.3156351","DOIUrl":null,"url":null,"abstract":"The rapid improvement of the next-generation sequencing (NGS) technologies has enabled unprecedented production of huge DNA sequence data at low cost. However, the NGS technologies are still limited to generate short DNA sequences, which has led to the development of many assembly algorithms to recover whole genome sequences from those short sequences. Unfortunately, the assembly algorithms alone can only construct scaffold sequences, which are generally much shorter than chromosome sequences. To generate chromosome sequences, additional expensive experimental data is required. To overcome this problem, there have been many studies to develop new computational algorithms to further merge the scaffold sequences, and produce chromosome-level sequences by utilizing an existing genome assembly of a related species called a reference. However, even though the quality of the chosen reference assembly is critical for generating a good final assembly, its effect is not well uncovered yet. In this study, we measured the effect of the reference genome assembly on the quality of the final assembly generated by reference-guided assembly algorithms. By using the genome assemblies of total eleven reference species (eight primates and three rodents), the human genome sequences were assembled from scaffold sequences by one of the reference-guided assembly algorithms, called RACA, and they were compared with known genome sequences to measure their quality in terms of the number of misassemblies. The effect of the quality of the reference assemblies was investigated in terms of divergence time against human, alignment coverage between the reference and human, and the amount of inclusion of core eukaryotic genes. We found that the divergence time is a good indicator of the quality of the final assembly when reference assemblies with high quality are used. We believe this study will contribute to broaden our understanding of the effect and importance of a reference assembly on the reference-guided assembly task.","PeriodicalId":415207,"journal":{"name":"Proceedings of the 8th International Conference on Computational Systems-Biology and Bioinformatics","volume":"179 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The effect of reference species on reference-guided genome assembly\",\"authors\":\"Juyeon Kim, Daehwan Lee, Mikang Sim, Jongin Lee, Jaebum Kim\",\"doi\":\"10.1145/3156346.3156351\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The rapid improvement of the next-generation sequencing (NGS) technologies has enabled unprecedented production of huge DNA sequence data at low cost. However, the NGS technologies are still limited to generate short DNA sequences, which has led to the development of many assembly algorithms to recover whole genome sequences from those short sequences. Unfortunately, the assembly algorithms alone can only construct scaffold sequences, which are generally much shorter than chromosome sequences. To generate chromosome sequences, additional expensive experimental data is required. To overcome this problem, there have been many studies to develop new computational algorithms to further merge the scaffold sequences, and produce chromosome-level sequences by utilizing an existing genome assembly of a related species called a reference. However, even though the quality of the chosen reference assembly is critical for generating a good final assembly, its effect is not well uncovered yet. In this study, we measured the effect of the reference genome assembly on the quality of the final assembly generated by reference-guided assembly algorithms. By using the genome assemblies of total eleven reference species (eight primates and three rodents), the human genome sequences were assembled from scaffold sequences by one of the reference-guided assembly algorithms, called RACA, and they were compared with known genome sequences to measure their quality in terms of the number of misassemblies. The effect of the quality of the reference assemblies was investigated in terms of divergence time against human, alignment coverage between the reference and human, and the amount of inclusion of core eukaryotic genes. We found that the divergence time is a good indicator of the quality of the final assembly when reference assemblies with high quality are used. We believe this study will contribute to broaden our understanding of the effect and importance of a reference assembly on the reference-guided assembly task.\",\"PeriodicalId\":415207,\"journal\":{\"name\":\"Proceedings of the 8th International Conference on Computational Systems-Biology and Bioinformatics\",\"volume\":\"179 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 8th International Conference on Computational Systems-Biology and Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3156346.3156351\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th International Conference on Computational Systems-Biology and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3156346.3156351","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

下一代测序(NGS)技术的快速发展使得以低成本生产大量DNA序列数据成为可能。然而,NGS技术仍然局限于生成短DNA序列,这导致了许多组装算法的发展,以从这些短序列中恢复全基因组序列。不幸的是,装配算法本身只能构建支架序列,而支架序列通常比染色体序列短得多。为了产生染色体序列,需要额外的昂贵的实验数据。为了克服这个问题,已经有许多研究开发新的计算算法来进一步合并支架序列,并通过利用称为参考的相关物种的现有基因组组装产生染色体水平序列。然而,尽管所选择的参考装配体的质量对于生成良好的最终装配体至关重要,但其影响尚未得到很好的揭示。在本研究中,我们测量了参考基因组组装对参考导向组装算法生成的最终组装质量的影响。利用11种参考物种(8种灵长类动物和3种啮齿类动物)的基因组序列,采用参考导向的RACA组装算法从支架序列中组装人类基因组序列,并将其与已知基因组序列进行比较,以衡量其错组装数量的质量。从与人的背离时间、与人的比对覆盖率以及核心真核基因的包含量等方面考察了参考序列质量的影响。我们发现,当使用高质量的参考装配体时,散度时间是最终装配体质量的一个很好的指标。我们相信本研究将有助于拓宽我们对参考装配在参考引导装配任务中的作用和重要性的理解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The effect of reference species on reference-guided genome assembly
The rapid improvement of the next-generation sequencing (NGS) technologies has enabled unprecedented production of huge DNA sequence data at low cost. However, the NGS technologies are still limited to generate short DNA sequences, which has led to the development of many assembly algorithms to recover whole genome sequences from those short sequences. Unfortunately, the assembly algorithms alone can only construct scaffold sequences, which are generally much shorter than chromosome sequences. To generate chromosome sequences, additional expensive experimental data is required. To overcome this problem, there have been many studies to develop new computational algorithms to further merge the scaffold sequences, and produce chromosome-level sequences by utilizing an existing genome assembly of a related species called a reference. However, even though the quality of the chosen reference assembly is critical for generating a good final assembly, its effect is not well uncovered yet. In this study, we measured the effect of the reference genome assembly on the quality of the final assembly generated by reference-guided assembly algorithms. By using the genome assemblies of total eleven reference species (eight primates and three rodents), the human genome sequences were assembled from scaffold sequences by one of the reference-guided assembly algorithms, called RACA, and they were compared with known genome sequences to measure their quality in terms of the number of misassemblies. The effect of the quality of the reference assemblies was investigated in terms of divergence time against human, alignment coverage between the reference and human, and the amount of inclusion of core eukaryotic genes. We found that the divergence time is a good indicator of the quality of the final assembly when reference assemblies with high quality are used. We believe this study will contribute to broaden our understanding of the effect and importance of a reference assembly on the reference-guided assembly task.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信