Identifying large sets of unrelated individuals and unrelated markers.

Q2 Decision Sciences
Kuruvilla Joseph Abraham, Clara Diaz
{"title":"Identifying large sets of unrelated individuals and unrelated markers.","authors":"Kuruvilla Joseph Abraham,&nbsp;Clara Diaz","doi":"10.1186/1751-0473-9-6","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Genetic Analyses in large sample populations are important for a better understanding of the variation between populations, for designing conservation programs, for detecting rare mutations which may be risk factors for a variety of diseases, among other reasons. However these analyses frequently assume that the participating individuals or animals are mutually unrelated which may not be the case in large samples, leading to erroneous conclusions. In order to retain as much data as possible while minimizing the risk of false positives it is useful to identify a large subset of relatively unrelated individuals in the population. This can be done using a heuristic for finding a large set of independent of nodes in an undirected graph. We describe a fast randomized heuristic for this purpose. The same methodology can also be used for identifying a suitable set of markers for analyzing population stratification, and other instances where a rapid heuristic for maximal independent sets in large graphs is needed.</p><p><strong>Results: </strong>We present FastIndep, a fast random heuristic algorithm for finding a maximal independent set of nodes in an arbitrary undirected graph along with an efficient implementation in C++. On a 64 bit Linux or MacOS platform the execution time is a few minutes, even with a graph of several thousand nodes. The algorithm can discover multiple solutions of the same cardinality. FastIndep can be used to discover unlinked markers, and unrelated individuals in populations.</p><p><strong>Conclusions: </strong>The methods presented here provide a quick and efficient method for identifying sets of unrelated individuals in large populations and unlinked markers in marker panels. The C++ source code and instructions along with utilities for generating the input files in the appropriate format are available at http://taurus.ansci.iastate.edu/wiki/people/jabr/Joseph_Abraham.html.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"9 1","pages":"6"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-9-6","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Source Code for Biology and Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/1751-0473-9-6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Decision Sciences","Score":null,"Total":0}
引用次数: 19

Abstract

Background: Genetic Analyses in large sample populations are important for a better understanding of the variation between populations, for designing conservation programs, for detecting rare mutations which may be risk factors for a variety of diseases, among other reasons. However these analyses frequently assume that the participating individuals or animals are mutually unrelated which may not be the case in large samples, leading to erroneous conclusions. In order to retain as much data as possible while minimizing the risk of false positives it is useful to identify a large subset of relatively unrelated individuals in the population. This can be done using a heuristic for finding a large set of independent of nodes in an undirected graph. We describe a fast randomized heuristic for this purpose. The same methodology can also be used for identifying a suitable set of markers for analyzing population stratification, and other instances where a rapid heuristic for maximal independent sets in large graphs is needed.

Results: We present FastIndep, a fast random heuristic algorithm for finding a maximal independent set of nodes in an arbitrary undirected graph along with an efficient implementation in C++. On a 64 bit Linux or MacOS platform the execution time is a few minutes, even with a graph of several thousand nodes. The algorithm can discover multiple solutions of the same cardinality. FastIndep can be used to discover unlinked markers, and unrelated individuals in populations.

Conclusions: The methods presented here provide a quick and efficient method for identifying sets of unrelated individuals in large populations and unlinked markers in marker panels. The C++ source code and instructions along with utilities for generating the input files in the appropriate format are available at http://taurus.ansci.iastate.edu/wiki/people/jabr/Joseph_Abraham.html.

Abstract Image

识别大量不相关的个体和不相关的标记。
背景:大样本群体的遗传分析对于更好地理解群体之间的差异,设计保护计划,检测可能是各种疾病风险因素的罕见突变以及其他原因都很重要。然而,这些分析经常假设参与的个体或动物是相互不相关的,而在大样本中可能不是这样,从而导致错误的结论。为了保留尽可能多的数据,同时将假阳性的风险降至最低,在人群中确定一大批相对无关的个体是有用的。这可以使用启发式方法在无向图中找到大量独立的节点集。为此,我们描述了一种快速随机启发式算法。同样的方法也可以用于识别一组合适的标记来分析人口分层,以及其他需要在大型图中快速启发式地寻找最大独立集的情况。结果:我们提出了FastIndep,一种快速随机启发式算法,用于在任意无向图中寻找最大独立节点集,并在c++中有效实现。在64位Linux或MacOS平台上,即使有数千个节点的图,执行时间也只有几分钟。该算法可以发现相同基数的多个解。FastIndep可用于发现非连锁标记,以及种群中不相关的个体。结论:本文提出的方法提供了一种快速有效的方法来识别大群体中的无亲缘关系个体和标记面板中的非连锁标记。c++源代码和指令以及用于生成适当格式的输入文件的实用程序可在http://taurus.ansci.iastate.edu/wiki/people/jabr/Joseph_Abraham.html上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Source Code for Biology and Medicine
Source Code for Biology and Medicine Decision Sciences-Information Systems and Management
自引率
0.00%
发文量
0
期刊介绍: Source Code for Biology and Medicine is a peer-reviewed open access, online journal that publishes articles on source code employed over a wide range of applications in biology and medicine. The journal"s aim is to publish source code for distribution and use in the public domain in order to advance biological and medical research. Through this dissemination, it may be possible to shorten the time required for solving certain computational problems for which there is limited source code availability or resources.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信