Haplotype parsing: methods for extracting information from human genetic variations.

Applied bioinformatics Pub Date : 2004-01-01 DOI:10.2165/00822942-200403020-00012

Russell Schwartz

{"title":"Haplotype parsing: methods for extracting information from human genetic variations.","authors":"Russell Schwartz","doi":"10.2165/00822942-200403020-00012","DOIUrl":null,"url":null,"abstract":"<p><p>While the shared consensus genetic sequence of our species contains a great deal of information about our common biology, there is also much to be learned from the subtle genetic variations across our species. These variations are believed to be generally of little or no direct functional significance and predominantly reflect the chance accumulation of small genetic changes since our emergence as a species. Therefore, they carry little useful information when observed in a single individual. When tallied across a whole population though, these chance mutations can teach us a great deal about our evolutionary history and the patterns of inheritance in particular individuals. In particular, frequently observed patterns of single nucleotide polymorphisms (SNPs) in a population can identify segments of chromosome that have been passed down largely intact through long stretches of our evolution. Finding these frequently conserved chromosomal segments, or haplotypes, and developing methods to identify haplotype patterns in particular individuals, will in turn help us to identify those particular segments that carry genetic factors influencing risk for many common human diseases. To make the best use of this data, we will need to develop new models for the encoding of information in genome variations--the \"language of genetic variation\"--and new algorithms for fitting datasets to those models. This article surveys past work by the author and colleagues on this problem, utilising computational methods for locating frequent patterns in haploid sequence data, and \"parsing\" sequences so as to optimally explain them given the knowledge of the general population structure. The author's recent work in this area has been compiled into a set of computational tools available at http://www-2.cs.cmu.edu/~russells/software/hapmotif.html.</p>","PeriodicalId":87049,"journal":{"name":"Applied bioinformatics","volume":"3 2-3","pages":"181-91"},"PeriodicalIF":0.0000,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2165/00822942-200403020-00012","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2165/00822942-200403020-00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

While the shared consensus genetic sequence of our species contains a great deal of information about our common biology, there is also much to be learned from the subtle genetic variations across our species. These variations are believed to be generally of little or no direct functional significance and predominantly reflect the chance accumulation of small genetic changes since our emergence as a species. Therefore, they carry little useful information when observed in a single individual. When tallied across a whole population though, these chance mutations can teach us a great deal about our evolutionary history and the patterns of inheritance in particular individuals. In particular, frequently observed patterns of single nucleotide polymorphisms (SNPs) in a population can identify segments of chromosome that have been passed down largely intact through long stretches of our evolution. Finding these frequently conserved chromosomal segments, or haplotypes, and developing methods to identify haplotype patterns in particular individuals, will in turn help us to identify those particular segments that carry genetic factors influencing risk for many common human diseases. To make the best use of this data, we will need to develop new models for the encoding of information in genome variations--the "language of genetic variation"--and new algorithms for fitting datasets to those models. This article surveys past work by the author and colleagues on this problem, utilising computational methods for locating frequent patterns in haploid sequence data, and "parsing" sequences so as to optimally explain them given the knowledge of the general population structure. The author's recent work in this area has been compiled into a set of computational tools available at http://www-2.cs.cmu.edu/~russells/software/hapmotif.html.

查看原文本刊更多论文

单倍型分析:从人类遗传变异中提取信息的方法。

虽然我们人类共有的共识基因序列包含了大量关于我们共同生物学的信息，但从我们物种之间微妙的遗传变异中也有很多东西需要学习。这些变异通常被认为很少或没有直接的功能意义，主要反映了自我们作为一个物种出现以来小的遗传变化的偶然积累。因此，当观察单个个体时，它们携带的有用信息很少。当对整个种群进行统计时，这些偶然的突变可以告诉我们很多关于我们的进化史和特定个体的遗传模式的信息。特别是，在一个群体中经常观察到的单核苷酸多态性(SNPs)模式可以识别出在我们的进化过程中大部分完好无损地遗传下来的染色体片段。找到这些经常保守的染色体片段或单倍型，并开发方法来识别特定个体的单倍型模式，将反过来帮助我们识别那些携带影响许多常见人类疾病风险的遗传因素的特定片段。为了充分利用这些数据，我们需要开发新的模型来编码基因组变异中的信息——“遗传变异的语言”——以及新的算法来将数据集拟合到这些模型中。本文综述了作者及其同事在这一问题上的过去工作，利用计算方法在单倍体序列数据中定位频繁模式，并“解析”序列，以便在已知一般群体结构的情况下最佳地解释它们。作者最近在这一领域的工作已汇编成一套计算工具，可在http://www-2.cs.cmu.edu/~russells/software/hapmotif.html上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied bioinformatics

自引率

0.00%

发文量