The Challenge of Genome Sequence Assembly

Q3 Computer Science

Open Bioinformatics Journal Pub Date : 2018-10-17 DOI:10.2174/1875036201811010231

A. Collins

{"title":"The Challenge of Genome Sequence Assembly","authors":"A. Collins","doi":"10.2174/1875036201811010231","DOIUrl":null,"url":null,"abstract":"\n \n Although whole genome sequencing is enabling numerous advances in many fields achieving complete chromosome-level sequence assemblies for diverse species presents difficulties. The problems in part reflect the limitations of current sequencing technologies. Chromosome assembly from ‘short read’ sequence data is confounded by the presence of repetitive genome regions with numerous similar sequence tracts which cannot be accurately positioned in the assembled sequence. Longer sequence reads often have higher error rates and may still be too short to span the larger gaps between contigs.\n \n \n \n Given the emergence of exciting new applications using sequencing technology, such as the Earth BioGenome Project, it is necessary to further develop and apply a range of strategies to achieve robust chromosome-level sequence assembly. Reviewed here are a range of methods to enhance assembly which include the use of cross-species synteny to understand relationships between sequence contigs, the development of independent genetic and/or physical scaffold maps as frameworks for assembly (for example, radiation hybrid, optical motif and chromatin interaction maps) and the use of patterns of linkage disequilibrium to help position, orient and locate contigs.\n \n \n \n A range of methods exist which might be further developed to facilitate cost-effective large-scale sequence assembly for diverse species. A combination of strategies is required to best assemble sequence data into chromosome-level assemblies. There are a number of routes towards the development of maps which span chromosomes (including physical, genetic and linkage disequilibrium maps) and construction of these whole chromosome maps greatly facilitates the ordering and orientation of sequence contigs.\n","PeriodicalId":38956,"journal":{"name":"Open Bioinformatics Journal","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2018-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Open Bioinformatics Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/1875036201811010231","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 3

Abstract

Although whole genome sequencing is enabling numerous advances in many fields achieving complete chromosome-level sequence assemblies for diverse species presents difficulties. The problems in part reflect the limitations of current sequencing technologies. Chromosome assembly from ‘short read’ sequence data is confounded by the presence of repetitive genome regions with numerous similar sequence tracts which cannot be accurately positioned in the assembled sequence. Longer sequence reads often have higher error rates and may still be too short to span the larger gaps between contigs. Given the emergence of exciting new applications using sequencing technology, such as the Earth BioGenome Project, it is necessary to further develop and apply a range of strategies to achieve robust chromosome-level sequence assembly. Reviewed here are a range of methods to enhance assembly which include the use of cross-species synteny to understand relationships between sequence contigs, the development of independent genetic and/or physical scaffold maps as frameworks for assembly (for example, radiation hybrid, optical motif and chromatin interaction maps) and the use of patterns of linkage disequilibrium to help position, orient and locate contigs. A range of methods exist which might be further developed to facilitate cost-effective large-scale sequence assembly for diverse species. A combination of strategies is required to best assemble sequence data into chromosome-level assemblies. There are a number of routes towards the development of maps which span chromosomes (including physical, genetic and linkage disequilibrium maps) and construction of these whole chromosome maps greatly facilitates the ordering and orientation of sequence contigs.

查看原文本刊更多论文

基因组序列组装的挑战

尽管全基因组测序在许多领域取得了许多进展，但实现不同物种的完整染色体水平序列组装存在困难。这些问题在一定程度上反映了当前测序技术的局限性。来自“短读”序列数据的染色体组装被具有许多相似序列域的重复基因组区域的存在所混淆，这些区域无法准确定位在组装的序列中。较长的序列读取通常具有较高的错误率，并且可能仍然太短，无法跨越重叠群之间的较大间隙。鉴于使用测序技术的令人兴奋的新应用的出现，如地球生物基因组计划，有必要进一步开发和应用一系列策略，以实现强大的染色体水平序列组装。本文综述了一系列增强组装的方法，包括使用跨物种同源性来理解序列重叠群之间的关系，开发独立的遗传和/或物理支架图作为组装的框架（例如，辐射杂交、光学基序和染色质相互作用图），以及使用连锁不平衡模式来帮助定位，定向和定位重叠群。存在一系列方法，可以进一步开发，以促进不同物种的成本效益高的大规模序列组装。需要策略的组合来将序列数据最好地组装成染色体水平的组装。有许多途径可以开发跨越染色体的图谱（包括物理、遗传和连锁不平衡图谱），这些全染色体图谱的构建极大地促进了序列重叠群的排序和定向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Open Bioinformatics Journal Computer Science-Computer Science (miscellaneous)

CiteScore

2.40

自引率

0.00%

发文量

期刊介绍： The Open Bioinformatics Journal is an Open Access online journal, which publishes research articles, reviews/mini-reviews, letters, clinical trial studies and guest edited single topic issues in all areas of bioinformatics and computational biology. The coverage includes biomedicine, focusing on large data acquisition, analysis and curation, computational and statistical methods for the modeling and analysis of biological data, and descriptions of new algorithms and databases. The Open Bioinformatics Journal, a peer reviewed journal, is an important and reliable source of current information on the developments in the field. The emphasis will be on publishing quality articles rapidly and freely available worldwide.