基于区间图建模聚类算法的NARAC数据集单倍型块划分

2018 9th Cairo International Biomedical Engineering Conference (CIBEC) Pub Date : 2018-12-01 DOI:10.1109/CIBEC.2018.8641758

Fatma S. Ibrahim, Mohamed N. Saad, A. M. Said, Hesham F. A. Hamed

{"title":"基于区间图建模聚类算法的NARAC数据集单倍型块划分","authors":"Fatma S. Ibrahim, Mohamed N. Saad, A. M. Said, Hesham F. A. Hamed","doi":"10.1109/CIBEC.2018.8641758","DOIUrl":null,"url":null,"abstract":"Recently, genome-wide association studies (GWAS) depend on haplotype blocks rather than individual single-nucleotide polymorphism (SNP) because they are more powerful in association analysis. The computation of a genotyped dataset is considered as a challenge because of its massive size and its complexity. Several algorithms have been proposed for partitioning the genotype data into haplotype blocks. Most existing algorithms part genotype data into small blocks and ignore the middle regions of low linkage disequilibrium (LD) between strong related SNPs. Other methods produce redundant blocks by identifying haplotype block if all inside SNPs associated with the start and end SNPs of the block. This study has adopted the latest haplotype block partitioning method that based on the interval graph modeling of clusters algorithm. The proposed algorithm was applied on the North American Rheumatoid Arthritis Consortium (NARAC) dataset and then compared to confidence interval test (CIT), four-gamete test (FGT), and the solid spine of linkage disequilibrium (SSLD) methods. The dataset is preprocessed, and missing SNPs are imputed. This study demonstrates the distinctions between haplotype block partitioning methods and detects the haplotype blocks for NARAC dataset. The comparative study gives a better understanding of each method and produces different outcomes with different parameters.","PeriodicalId":407809,"journal":{"name":"2018 9th Cairo International Biomedical Engineering Conference (CIBEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Haplotype Block Partitioning for NARAC Dataset Using Interval Graph Modeling of Clusters Algorithm\",\"authors\":\"Fatma S. Ibrahim, Mohamed N. Saad, A. M. Said, Hesham F. A. Hamed\",\"doi\":\"10.1109/CIBEC.2018.8641758\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, genome-wide association studies (GWAS) depend on haplotype blocks rather than individual single-nucleotide polymorphism (SNP) because they are more powerful in association analysis. The computation of a genotyped dataset is considered as a challenge because of its massive size and its complexity. Several algorithms have been proposed for partitioning the genotype data into haplotype blocks. Most existing algorithms part genotype data into small blocks and ignore the middle regions of low linkage disequilibrium (LD) between strong related SNPs. Other methods produce redundant blocks by identifying haplotype block if all inside SNPs associated with the start and end SNPs of the block. This study has adopted the latest haplotype block partitioning method that based on the interval graph modeling of clusters algorithm. The proposed algorithm was applied on the North American Rheumatoid Arthritis Consortium (NARAC) dataset and then compared to confidence interval test (CIT), four-gamete test (FGT), and the solid spine of linkage disequilibrium (SSLD) methods. The dataset is preprocessed, and missing SNPs are imputed. This study demonstrates the distinctions between haplotype block partitioning methods and detects the haplotype blocks for NARAC dataset. The comparative study gives a better understanding of each method and produces different outcomes with different parameters.\",\"PeriodicalId\":407809,\"journal\":{\"name\":\"2018 9th Cairo International Biomedical Engineering Conference (CIBEC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 9th Cairo International Biomedical Engineering Conference (CIBEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIBEC.2018.8641758\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 9th Cairo International Biomedical Engineering Conference (CIBEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIBEC.2018.8641758","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

目前，全基因组关联研究(GWAS)依赖于单倍型块而不是单个单核苷酸多态性(SNP)，因为它们在关联分析中更强大。基因型数据集的计算由于其庞大的规模和复杂性而被认为是一个挑战。已经提出了几种将基因型数据划分为单倍型块的算法。大多数现有算法将基因型数据分成小块，忽略了强相关snp之间低连锁不平衡(LD)的中间区域。其他方法通过识别单倍型片段产生冗余片段，如果所有内部snp都与片段的开始和结束snp相关。本研究采用了最新的基于区间图聚类建模算法的单倍型块划分方法。将该算法应用于北美类风湿关节炎联盟(NARAC)数据集，然后与置信区间检验(CIT)、四配子检验(FGT)和联动不平衡固体脊柱(SSLD)方法进行比较。对数据集进行预处理，输入缺失的snp。本研究展示了单倍型块划分方法之间的区别，并检测了NARAC数据集的单倍型块。对比研究可以更好地理解每种方法，并在不同的参数下产生不同的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Haplotype Block Partitioning for NARAC Dataset Using Interval Graph Modeling of Clusters Algorithm

Recently, genome-wide association studies (GWAS) depend on haplotype blocks rather than individual single-nucleotide polymorphism (SNP) because they are more powerful in association analysis. The computation of a genotyped dataset is considered as a challenge because of its massive size and its complexity. Several algorithms have been proposed for partitioning the genotype data into haplotype blocks. Most existing algorithms part genotype data into small blocks and ignore the middle regions of low linkage disequilibrium (LD) between strong related SNPs. Other methods produce redundant blocks by identifying haplotype block if all inside SNPs associated with the start and end SNPs of the block. This study has adopted the latest haplotype block partitioning method that based on the interval graph modeling of clusters algorithm. The proposed algorithm was applied on the North American Rheumatoid Arthritis Consortium (NARAC) dataset and then compared to confidence interval test (CIT), four-gamete test (FGT), and the solid spine of linkage disequilibrium (SSLD) methods. The dataset is preprocessed, and missing SNPs are imputed. This study demonstrates the distinctions between haplotype block partitioning methods and detects the haplotype blocks for NARAC dataset. The comparative study gives a better understanding of each method and produces different outcomes with different parameters.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 9th Cairo International Biomedical Engineering Conference (CIBEC)

自引率

0.00%

发文量