Efficient algorithms for genome-wide tagSNP selection across populations via the linkage disequilibrium criterion.

Lan Liu, Yonghui Wu, S. Lonardi, Tao Jiang
{"title":"Efficient algorithms for genome-wide tagSNP selection across populations via the linkage disequilibrium criterion.","authors":"Lan Liu, Yonghui Wu, S. Lonardi, Tao Jiang","doi":"10.1142/9781860948732_0011","DOIUrl":null,"url":null,"abstract":"In this paper, we study the tagSNP selection problem on multiple populations using the pairwise r(2) linkage disequilibrium criterion. We propose a novel combinatorial optimization model for the tagSNP selection problem, called the minimum common tagSNP selection (MCTS) problem, and present efficient solutions for MCTS. Our approach consists of three main steps including (i) partitioning the SNP markers into small disjoint components, (ii) applying some data reduction rules to simplify the problem, and (iii) applying either a fast greedy algorithm or a Lagrangian relaxation algorithm to solve the remaining (general) MCTS. These algorithms also provide lower bounds on tagging (i.e. the minimum number of tagSNPs needed). The lower bounds allow us to evaluate how far our solution is from the optimum. To the best of our knowledge, it is the first time tagging lower bounds are discussed in the literature. We assess the performance of our algorithms on real HapMap data for genome-wide tagging. The experiments demonstrate that our algorithms run 3 to 4 orders of magnitude faster than the existing single-population tagging programs like FESTA, LD-Select and the multiple-population tagging method MultiPop-TagSelect. Our method also greatly reduces the required tagSNPs compared to LD-Select on a single population and MultiPop-TagSelect on multiple populations. Moreover, the numbers of tagSNPs selected by our algorithms are almost optimal since they are very close to the corresponding lower bounds obtained by our method.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"6 1","pages":"67-78"},"PeriodicalIF":0.0000,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/9781860948732_0011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

In this paper, we study the tagSNP selection problem on multiple populations using the pairwise r(2) linkage disequilibrium criterion. We propose a novel combinatorial optimization model for the tagSNP selection problem, called the minimum common tagSNP selection (MCTS) problem, and present efficient solutions for MCTS. Our approach consists of three main steps including (i) partitioning the SNP markers into small disjoint components, (ii) applying some data reduction rules to simplify the problem, and (iii) applying either a fast greedy algorithm or a Lagrangian relaxation algorithm to solve the remaining (general) MCTS. These algorithms also provide lower bounds on tagging (i.e. the minimum number of tagSNPs needed). The lower bounds allow us to evaluate how far our solution is from the optimum. To the best of our knowledge, it is the first time tagging lower bounds are discussed in the literature. We assess the performance of our algorithms on real HapMap data for genome-wide tagging. The experiments demonstrate that our algorithms run 3 to 4 orders of magnitude faster than the existing single-population tagging programs like FESTA, LD-Select and the multiple-population tagging method MultiPop-TagSelect. Our method also greatly reduces the required tagSNPs compared to LD-Select on a single population and MultiPop-TagSelect on multiple populations. Moreover, the numbers of tagSNPs selected by our algorithms are almost optimal since they are very close to the corresponding lower bounds obtained by our method.
基于连锁不平衡准则的全基因组标记snp选择算法。
本文利用配对r(2)连锁不平衡准则研究了多种群的标签snp选择问题。针对标签snp选择问题,我们提出了一种新的组合优化模型,称为最小共同标签snp选择(MCTS)问题,并给出了MCTS的有效解决方案。我们的方法包括三个主要步骤,包括(i)将SNP标记划分为小的不连接的组件,(ii)应用一些数据约简规则来简化问题,以及(iii)应用快速贪婪算法或拉格朗日松弛算法来解决剩余的(一般)MCTS。这些算法还提供了标签的下限(即所需的标签snp的最小数量)。下界允许我们计算解离最优解的距离。据我们所知,这是文献中第一次讨论标注下界。我们评估了我们的算法在全基因组标记的真实HapMap数据上的性能。实验表明,我们的算法运行速度比现有的单种群标记程序(如FESTA, LD-Select和多种群标记方法MultiPop-TagSelect)快3到4个数量级。与单种群的LD-Select和多种群的MultiPop-TagSelect相比,我们的方法也大大减少了所需的tagsnp。此外,我们的算法选择的标签snp数量几乎是最优的,因为它们非常接近我们的方法得到的相应下界。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信