简单:一种简单的基因型插入方法

Y. Lin, Chun-Tien Chang, C. Tang, Wen-Ping Hsieh
{"title":"简单:一种简单的基因型插入方法","authors":"Y. Lin, Chun-Tien Chang, C. Tang, Wen-Ping Hsieh","doi":"10.1109/CISIS.2012.63","DOIUrl":null,"url":null,"abstract":"High-throughput technology for genotyping has made genome-wide associations possible. Single nucleotide polymorphism (SNP) data derived from array-based technology are usually flawed due to missing data, although they have generally high call rates and good concordance rates across different genotype calling schemes. Missing SNPs can bias the results of association analyses and hence loci with missing data are removed in some studies. Imputation is a method of compensating for the missing data by filling in the most probable values. It can increase the power of the association study and does not involve extra cost to genotype the missing SNPs. In this article, we propose a simple imputation method (Simpute) that takes advantage of the high resolution of SNPs in either the array platform or the mass parallel sequencing platform. It is based on the linkage disequilibrium (LD) structure of the chromosome and only two nearby SNPs are needed to fill in the missing data. Simpute does not use any reference data. We tested this method by randomly masking the genotype data of the international Hap Map phase III project, and the evaluation is made on Chromosome 21. The proposed Simpute algorithm was compared with two algorithms. At highly linked SNP loci, it performs approximately well as BEAGLE, which is a general-purpose algorithm and integrates lots of information. Simpute outperforms the second algorithm proposed by Jung et al., which does not use any reference samples as Simpute. The best feature of Simpute is its computational efficiency with complexity of order, where n is the number of missing SNPs, w is the number of the positions of the missing SNPs and m is the number of people considered. Simpute provides a simple, accurate and fast solution to the whole genome imputation. We have demonstrated that when the SNPs are densely distributed on the chromosome with high linkage disequilibrium between adjacent loci, there is no need to adopt complicated algorithms. Simpute is suitable for regular screening of the large scale SNP genotyping especially when the sample size is large and the efficiency is a major issue of the workflow.","PeriodicalId":158978,"journal":{"name":"2012 Sixth International Conference on Complex, Intelligent, and Software Intensive Systems","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Simpute: A Simple Genotype Imputation Method\",\"authors\":\"Y. Lin, Chun-Tien Chang, C. Tang, Wen-Ping Hsieh\",\"doi\":\"10.1109/CISIS.2012.63\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High-throughput technology for genotyping has made genome-wide associations possible. Single nucleotide polymorphism (SNP) data derived from array-based technology are usually flawed due to missing data, although they have generally high call rates and good concordance rates across different genotype calling schemes. Missing SNPs can bias the results of association analyses and hence loci with missing data are removed in some studies. Imputation is a method of compensating for the missing data by filling in the most probable values. It can increase the power of the association study and does not involve extra cost to genotype the missing SNPs. In this article, we propose a simple imputation method (Simpute) that takes advantage of the high resolution of SNPs in either the array platform or the mass parallel sequencing platform. It is based on the linkage disequilibrium (LD) structure of the chromosome and only two nearby SNPs are needed to fill in the missing data. Simpute does not use any reference data. We tested this method by randomly masking the genotype data of the international Hap Map phase III project, and the evaluation is made on Chromosome 21. The proposed Simpute algorithm was compared with two algorithms. At highly linked SNP loci, it performs approximately well as BEAGLE, which is a general-purpose algorithm and integrates lots of information. Simpute outperforms the second algorithm proposed by Jung et al., which does not use any reference samples as Simpute. The best feature of Simpute is its computational efficiency with complexity of order, where n is the number of missing SNPs, w is the number of the positions of the missing SNPs and m is the number of people considered. Simpute provides a simple, accurate and fast solution to the whole genome imputation. We have demonstrated that when the SNPs are densely distributed on the chromosome with high linkage disequilibrium between adjacent loci, there is no need to adopt complicated algorithms. Simpute is suitable for regular screening of the large scale SNP genotyping especially when the sample size is large and the efficiency is a major issue of the workflow.\",\"PeriodicalId\":158978,\"journal\":{\"name\":\"2012 Sixth International Conference on Complex, Intelligent, and Software Intensive Systems\",\"volume\":\"82 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 Sixth International Conference on Complex, Intelligent, and Software Intensive Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CISIS.2012.63\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Sixth International Conference on Complex, Intelligent, and Software Intensive Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISIS.2012.63","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

高通量基因分型技术使全基因组关联成为可能。基于阵列技术的单核苷酸多态性(SNP)数据通常由于数据缺失而存在缺陷,尽管它们在不同的基因型调用方案中具有较高的调用率和良好的一致性。缺失的snp可能会使关联分析的结果产生偏差,因此在一些研究中删除了缺失数据的位点。代入是一种通过填充最可能的值来补偿缺失数据的方法。它可以增加关联研究的力量,并且不需要额外的成本来对缺失的SNPs进行基因分型。在本文中,我们提出了一种简单的imputation方法(Simpute),该方法利用了阵列平台或大规模并行测序平台中snp的高分辨率。它基于染色体的连锁不平衡(LD)结构,只需要两个附近的SNPs来填补缺失的数据。Simpute不使用任何参考数据。我们通过随机屏蔽国际Hap Map III期项目的基因型数据来检验该方法,并在21号染色体上进行评价。将提出的Simpute算法与两种算法进行比较。在高度链接的SNP位点上,它的性能近似于BEAGLE, BEAGLE是一种集成了大量信息的通用算法。Simpute优于Jung等人提出的第二种算法,后者不使用任何参考样本作为Simpute。Simpute的最大特点是它的计算效率和顺序复杂度,其中n为缺失snp的个数,w为缺失snp的位置个数,m为考虑的人数。Simpute提供了一种简单、准确、快速的全基因组插入解决方案。我们已经证明,当snp密集分布在相邻位点间连锁不平衡程度高的染色体上时,不需要采用复杂的算法。Simpute适用于大规模SNP基因分型的常规筛选,特别是当样本量大且效率是工作流程的主要问题时。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Simpute: A Simple Genotype Imputation Method
High-throughput technology for genotyping has made genome-wide associations possible. Single nucleotide polymorphism (SNP) data derived from array-based technology are usually flawed due to missing data, although they have generally high call rates and good concordance rates across different genotype calling schemes. Missing SNPs can bias the results of association analyses and hence loci with missing data are removed in some studies. Imputation is a method of compensating for the missing data by filling in the most probable values. It can increase the power of the association study and does not involve extra cost to genotype the missing SNPs. In this article, we propose a simple imputation method (Simpute) that takes advantage of the high resolution of SNPs in either the array platform or the mass parallel sequencing platform. It is based on the linkage disequilibrium (LD) structure of the chromosome and only two nearby SNPs are needed to fill in the missing data. Simpute does not use any reference data. We tested this method by randomly masking the genotype data of the international Hap Map phase III project, and the evaluation is made on Chromosome 21. The proposed Simpute algorithm was compared with two algorithms. At highly linked SNP loci, it performs approximately well as BEAGLE, which is a general-purpose algorithm and integrates lots of information. Simpute outperforms the second algorithm proposed by Jung et al., which does not use any reference samples as Simpute. The best feature of Simpute is its computational efficiency with complexity of order, where n is the number of missing SNPs, w is the number of the positions of the missing SNPs and m is the number of people considered. Simpute provides a simple, accurate and fast solution to the whole genome imputation. We have demonstrated that when the SNPs are densely distributed on the chromosome with high linkage disequilibrium between adjacent loci, there is no need to adopt complicated algorithms. Simpute is suitable for regular screening of the large scale SNP genotyping especially when the sample size is large and the efficiency is a major issue of the workflow.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信