{"title":"RadixHap:一个基于基数树的启发式算法,用于解决单个单倍型问题。","authors":"Tai-Chun Wang, Javid Taheri, Albert Y Zomaya","doi":"10.1504/IJBRA.2015.067336","DOIUrl":null,"url":null,"abstract":"<p><p>Single nucleotide polymorphism studies have recently received significant amount of attention from researchers in many life science disciplines. Previous researches indicated that a series of SNPs from the same chromosome, called haplotype, contains more information than individual SNPs. Hence, discovering ways to reconstruct reliable Single Individual Haplotypes becomes one of the core issues in the whole-genome research nowadays. However, obtaining sequence from current high-throughput sequencing technologies always contain inevitable sequencing errors and/or missing information. The SIH reconstruction problem can be formulated as bi-partitioning the input SNP fragment matrix into paternal and maternal sections to achieve minimum error correction; a problem that is proved to be NP-hard. In this study, we introduce a greedy approach, named RadixHap, to handle data sets with high error rates. The experimental results show that RadixHap can generate highly reliable results in most cases. Furthermore, the algorithm structure of RadixHap is particularly suitable for whole-genome scale data sets. </p>","PeriodicalId":35444,"journal":{"name":"International Journal of Bioinformatics Research and Applications","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJBRA.2015.067336","citationCount":"1","resultStr":"{\"title\":\"RadixHap: a radix tree-based heuristic for solving the single individual haplotyping problem.\",\"authors\":\"Tai-Chun Wang, Javid Taheri, Albert Y Zomaya\",\"doi\":\"10.1504/IJBRA.2015.067336\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Single nucleotide polymorphism studies have recently received significant amount of attention from researchers in many life science disciplines. Previous researches indicated that a series of SNPs from the same chromosome, called haplotype, contains more information than individual SNPs. Hence, discovering ways to reconstruct reliable Single Individual Haplotypes becomes one of the core issues in the whole-genome research nowadays. However, obtaining sequence from current high-throughput sequencing technologies always contain inevitable sequencing errors and/or missing information. The SIH reconstruction problem can be formulated as bi-partitioning the input SNP fragment matrix into paternal and maternal sections to achieve minimum error correction; a problem that is proved to be NP-hard. In this study, we introduce a greedy approach, named RadixHap, to handle data sets with high error rates. The experimental results show that RadixHap can generate highly reliable results in most cases. Furthermore, the algorithm structure of RadixHap is particularly suitable for whole-genome scale data sets. </p>\",\"PeriodicalId\":35444,\"journal\":{\"name\":\"International Journal of Bioinformatics Research and Applications\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1504/IJBRA.2015.067336\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Bioinformatics Research and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/IJBRA.2015.067336\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Health Professions\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Bioinformatics Research and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJBRA.2015.067336","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Health Professions","Score":null,"Total":0}
RadixHap: a radix tree-based heuristic for solving the single individual haplotyping problem.
Single nucleotide polymorphism studies have recently received significant amount of attention from researchers in many life science disciplines. Previous researches indicated that a series of SNPs from the same chromosome, called haplotype, contains more information than individual SNPs. Hence, discovering ways to reconstruct reliable Single Individual Haplotypes becomes one of the core issues in the whole-genome research nowadays. However, obtaining sequence from current high-throughput sequencing technologies always contain inevitable sequencing errors and/or missing information. The SIH reconstruction problem can be formulated as bi-partitioning the input SNP fragment matrix into paternal and maternal sections to achieve minimum error correction; a problem that is proved to be NP-hard. In this study, we introduce a greedy approach, named RadixHap, to handle data sets with high error rates. The experimental results show that RadixHap can generate highly reliable results in most cases. Furthermore, the algorithm structure of RadixHap is particularly suitable for whole-genome scale data sets.
期刊介绍:
Bioinformatics is an interdisciplinary research field that combines biology, computer science, mathematics and statistics into a broad-based field that will have profound impacts on all fields of biology. The emphasis of IJBRA is on basic bioinformatics research methods, tool development, performance evaluation and their applications in biology. IJBRA addresses the most innovative developments, research issues and solutions in bioinformatics and computational biology and their applications. Topics covered include Databases, bio-grid, system biology Biomedical image processing, modelling and simulation Bio-ontology and data mining, DNA assembly, clustering, mapping Computational genomics/proteomics Silico technology: computational intelligence, high performance computing E-health, telemedicine Gene expression, microarrays, identification, annotation Genetic algorithms, fuzzy logic, neural networks, data visualisation Hidden Markov models, machine learning, support vector machines Molecular evolution, phylogeny, modelling, simulation, sequence analysis Parallel algorithms/architectures, computational structural biology Phylogeny reconstruction algorithms, physiome, protein structure prediction Sequence assembly, search, alignment Signalling/computational biomedical data engineering Simulated annealing, statistical analysis, stochastic grammars.