RadixHap: a radix tree-based heuristic for solving the single individual haplotyping problem.

Q4 Health Professions

International Journal of Bioinformatics Research and Applications Pub Date : 2015-01-01 DOI:10.1504/IJBRA.2015.067336

Tai-Chun Wang, Javid Taheri, Albert Y Zomaya

{"title":"RadixHap: a radix tree-based heuristic for solving the single individual haplotyping problem.","authors":"Tai-Chun Wang, Javid Taheri, Albert Y Zomaya","doi":"10.1504/IJBRA.2015.067336","DOIUrl":null,"url":null,"abstract":"<p><p>Single nucleotide polymorphism studies have recently received significant amount of attention from researchers in many life science disciplines. Previous researches indicated that a series of SNPs from the same chromosome, called haplotype, contains more information than individual SNPs. Hence, discovering ways to reconstruct reliable Single Individual Haplotypes becomes one of the core issues in the whole-genome research nowadays. However, obtaining sequence from current high-throughput sequencing technologies always contain inevitable sequencing errors and/or missing information. The SIH reconstruction problem can be formulated as bi-partitioning the input SNP fragment matrix into paternal and maternal sections to achieve minimum error correction; a problem that is proved to be NP-hard. In this study, we introduce a greedy approach, named RadixHap, to handle data sets with high error rates. The experimental results show that RadixHap can generate highly reliable results in most cases. Furthermore, the algorithm structure of RadixHap is particularly suitable for whole-genome scale data sets. </p>","PeriodicalId":35444,"journal":{"name":"International Journal of Bioinformatics Research and Applications","volume":"11 1","pages":"10-29"},"PeriodicalIF":0.0000,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJBRA.2015.067336","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Bioinformatics Research and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJBRA.2015.067336","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Health Professions","Score":null,"Total":0}

引用次数: 1

Abstract

Single nucleotide polymorphism studies have recently received significant amount of attention from researchers in many life science disciplines. Previous researches indicated that a series of SNPs from the same chromosome, called haplotype, contains more information than individual SNPs. Hence, discovering ways to reconstruct reliable Single Individual Haplotypes becomes one of the core issues in the whole-genome research nowadays. However, obtaining sequence from current high-throughput sequencing technologies always contain inevitable sequencing errors and/or missing information. The SIH reconstruction problem can be formulated as bi-partitioning the input SNP fragment matrix into paternal and maternal sections to achieve minimum error correction; a problem that is proved to be NP-hard. In this study, we introduce a greedy approach, named RadixHap, to handle data sets with high error rates. The experimental results show that RadixHap can generate highly reliable results in most cases. Furthermore, the algorithm structure of RadixHap is particularly suitable for whole-genome scale data sets.

查看原文本刊更多论文

RadixHap:一个基于基数树的启发式算法，用于解决单个单倍型问题。

单核苷酸多态性研究近年来受到许多生命科学学科研究者的极大关注。先前的研究表明，来自同一染色体的一系列snp，称为单倍型，比单个snp包含更多的信息。因此，如何构建可靠的单个体单倍型成为当前全基因组研究的核心问题之一。然而，现有的高通量测序技术在获取序列时，往往存在不可避免的测序错误和/或缺失信息。SIH重建问题可以表述为将输入的SNP片段矩阵双划分为父亲和母亲部分，以实现最小的误差校正;被证明是np困难的问题。在本研究中，我们引入了一种名为RadixHap的贪婪方法来处理高错误率的数据集。实验结果表明，在大多数情况下，RadixHap可以产生高度可靠的结果。此外，RadixHap的算法结构特别适合全基因组规模的数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Bioinformatics Research and Applications Health Professions-Health Information Management

CiteScore

0.60

自引率

0.00%

发文量

期刊介绍： Bioinformatics is an interdisciplinary research field that combines biology, computer science, mathematics and statistics into a broad-based field that will have profound impacts on all fields of biology. The emphasis of IJBRA is on basic bioinformatics research methods, tool development, performance evaluation and their applications in biology. IJBRA addresses the most innovative developments, research issues and solutions in bioinformatics and computational biology and their applications. Topics covered include Databases, bio-grid, system biology Biomedical image processing, modelling and simulation Bio-ontology and data mining, DNA assembly, clustering, mapping Computational genomics/proteomics Silico technology: computational intelligence, high performance computing E-health, telemedicine Gene expression, microarrays, identification, annotation Genetic algorithms, fuzzy logic, neural networks, data visualisation Hidden Markov models, machine learning, support vector machines Molecular evolution, phylogeny, modelling, simulation, sequence analysis Parallel algorithms/architectures, computational structural biology Phylogeny reconstruction algorithms, physiome, protein structure prediction Sequence assembly, search, alignment Signalling/computational biomedical data engineering Simulated annealing, statistical analysis, stochastic grammars.