Fatma S. Ibrahim, Mohamed N. Saad, A. M. Said, Hesham F. A. Hamed
{"title":"Haplotype Block Partitioning for NARAC Dataset Using Interval Graph Modeling of Clusters Algorithm","authors":"Fatma S. Ibrahim, Mohamed N. Saad, A. M. Said, Hesham F. A. Hamed","doi":"10.1109/CIBEC.2018.8641758","DOIUrl":null,"url":null,"abstract":"Recently, genome-wide association studies (GWAS) depend on haplotype blocks rather than individual single-nucleotide polymorphism (SNP) because they are more powerful in association analysis. The computation of a genotyped dataset is considered as a challenge because of its massive size and its complexity. Several algorithms have been proposed for partitioning the genotype data into haplotype blocks. Most existing algorithms part genotype data into small blocks and ignore the middle regions of low linkage disequilibrium (LD) between strong related SNPs. Other methods produce redundant blocks by identifying haplotype block if all inside SNPs associated with the start and end SNPs of the block. This study has adopted the latest haplotype block partitioning method that based on the interval graph modeling of clusters algorithm. The proposed algorithm was applied on the North American Rheumatoid Arthritis Consortium (NARAC) dataset and then compared to confidence interval test (CIT), four-gamete test (FGT), and the solid spine of linkage disequilibrium (SSLD) methods. The dataset is preprocessed, and missing SNPs are imputed. This study demonstrates the distinctions between haplotype block partitioning methods and detects the haplotype blocks for NARAC dataset. The comparative study gives a better understanding of each method and produces different outcomes with different parameters.","PeriodicalId":407809,"journal":{"name":"2018 9th Cairo International Biomedical Engineering Conference (CIBEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 9th Cairo International Biomedical Engineering Conference (CIBEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIBEC.2018.8641758","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Recently, genome-wide association studies (GWAS) depend on haplotype blocks rather than individual single-nucleotide polymorphism (SNP) because they are more powerful in association analysis. The computation of a genotyped dataset is considered as a challenge because of its massive size and its complexity. Several algorithms have been proposed for partitioning the genotype data into haplotype blocks. Most existing algorithms part genotype data into small blocks and ignore the middle regions of low linkage disequilibrium (LD) between strong related SNPs. Other methods produce redundant blocks by identifying haplotype block if all inside SNPs associated with the start and end SNPs of the block. This study has adopted the latest haplotype block partitioning method that based on the interval graph modeling of clusters algorithm. The proposed algorithm was applied on the North American Rheumatoid Arthritis Consortium (NARAC) dataset and then compared to confidence interval test (CIT), four-gamete test (FGT), and the solid spine of linkage disequilibrium (SSLD) methods. The dataset is preprocessed, and missing SNPs are imputed. This study demonstrates the distinctions between haplotype block partitioning methods and detects the haplotype blocks for NARAC dataset. The comparative study gives a better understanding of each method and produces different outcomes with different parameters.