Estimating the recombination parameter: a commentary on 'Estimating the recombination parameter of a finite population model without selection' by Richard R. Hudson.

B. Weir
{"title":"Estimating the recombination parameter: a commentary on 'Estimating the recombination parameter of a finite population model without selection' by Richard R. Hudson.","authors":"B. Weir","doi":"10.1017/S0016672308009622","DOIUrl":null,"url":null,"abstract":"In 1987, Hudson proposed an estimator for the scaled recombination parameter C=4Nc, where N is the population size and c is the recombination rate between the two most distant of a set of segregating sites. This work came shortly after Kreitman (1983) published the first set of population genetic data at the DNA sequence level. Kreitman had been able to sequence 2.7 kilobases of the Drosophila melanogaster genome in 11 samples. It was felt at that time that population genetics was entering a new era, although Hudson cautioned that sufficiently large data sets for his new estimator ‘may require prohibitively large research efforts ’. Hudson’s estimator is based on the variance of the number of site differences between pairs of haplotypes and an estimate of the scaled mutation rate h=4Nm. The variance of the number of differences had already been shown by Brown et al. (1980) to be a convenient single-statistic summary of all the pairwise linkage disequilibria among a set of loci. The need for such a statistic continues as there is still doubt as to how well two-locus associations capture the full multilocus structure. Hudson provided an elegant derivation of the expected value of his statistic as a function of the unknown value C. His method of moments approach to estimation has the great virtue of simplicity although it would not be expected to behave as well as the maximum-likelihood methods that he (Hudson, 1993) and others (e.g. Kuhner et al., 2000; Wall, 2000; Fearnhead and Donnelly, 2001) developed later. Likelihood methods exploit all the information in a data set rather than just the information in a summary statistic and will do well provided the underlying evolutionary model is appropriate for the data being addressed. Writing 10 years after Hudson, Wakeley kept the same moment approach but provided modifications to Hudson’s method that improved its performance. Since 1983 the human genome has been sequenced, as have the genomes of several other species. There is now a ‘1000 genomes’ project (http://www.1000 genomes.org) under way for humans, and new sequencing techniques will make it possible very soon for population geneticists to obtain large samples of DNA sequence data. In 1987, Hudsonwished formore extensive DNA sequence data but he could not have foreseen the remarkable explosion of intermediate data – single-nucleotide polymorphisms (SNPs). Human geneticists are now generating 1 million SNP profiles for samples of thousands of individuals. By 2002, Hudson had produced a simulation procedure for SNP data (Hudson, 2002), and this has been used in studies such as Li and Stephens (2003) to detect recombination rate ‘hotspots ’. Hudson’s 1987 paper has the hallmarks of a classic paper. It introduced a new and simple method for estimating recombination rates from population samples rather than from pedigree data. More sophisticated methods have since been introduced, including composite-likelihood (Hudson, 2001) and others reviewed by Hellenthal and Stephens (2006), but the original method still has utility in evolutionary studies (e.g. Meikeljohn et al., 2004).","PeriodicalId":12777,"journal":{"name":"Genetical research","volume":"20 1","pages":"425-6"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetical research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/S0016672308009622","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In 1987, Hudson proposed an estimator for the scaled recombination parameter C=4Nc, where N is the population size and c is the recombination rate between the two most distant of a set of segregating sites. This work came shortly after Kreitman (1983) published the first set of population genetic data at the DNA sequence level. Kreitman had been able to sequence 2.7 kilobases of the Drosophila melanogaster genome in 11 samples. It was felt at that time that population genetics was entering a new era, although Hudson cautioned that sufficiently large data sets for his new estimator ‘may require prohibitively large research efforts ’. Hudson’s estimator is based on the variance of the number of site differences between pairs of haplotypes and an estimate of the scaled mutation rate h=4Nm. The variance of the number of differences had already been shown by Brown et al. (1980) to be a convenient single-statistic summary of all the pairwise linkage disequilibria among a set of loci. The need for such a statistic continues as there is still doubt as to how well two-locus associations capture the full multilocus structure. Hudson provided an elegant derivation of the expected value of his statistic as a function of the unknown value C. His method of moments approach to estimation has the great virtue of simplicity although it would not be expected to behave as well as the maximum-likelihood methods that he (Hudson, 1993) and others (e.g. Kuhner et al., 2000; Wall, 2000; Fearnhead and Donnelly, 2001) developed later. Likelihood methods exploit all the information in a data set rather than just the information in a summary statistic and will do well provided the underlying evolutionary model is appropriate for the data being addressed. Writing 10 years after Hudson, Wakeley kept the same moment approach but provided modifications to Hudson’s method that improved its performance. Since 1983 the human genome has been sequenced, as have the genomes of several other species. There is now a ‘1000 genomes’ project (http://www.1000 genomes.org) under way for humans, and new sequencing techniques will make it possible very soon for population geneticists to obtain large samples of DNA sequence data. In 1987, Hudsonwished formore extensive DNA sequence data but he could not have foreseen the remarkable explosion of intermediate data – single-nucleotide polymorphisms (SNPs). Human geneticists are now generating 1 million SNP profiles for samples of thousands of individuals. By 2002, Hudson had produced a simulation procedure for SNP data (Hudson, 2002), and this has been used in studies such as Li and Stephens (2003) to detect recombination rate ‘hotspots ’. Hudson’s 1987 paper has the hallmarks of a classic paper. It introduced a new and simple method for estimating recombination rates from population samples rather than from pedigree data. More sophisticated methods have since been introduced, including composite-likelihood (Hudson, 2001) and others reviewed by Hellenthal and Stephens (2006), but the original method still has utility in evolutionary studies (e.g. Meikeljohn et al., 2004).
重组参数的估计:对Richard R. Hudson“估计无选择的有限种群模型的重组参数”的评注。
1987年,Hudson提出了缩放重组参数C=4Nc的估计量,其中N为种群大小,C为一组分离位点中距离最远的两个位点之间的重组率。这项工作是在Kreitman(1983)发表第一组DNA序列水平的种群遗传数据后不久进行的。克雷特曼已经能够在11个样本中对黑腹果蝇基因组的2.7万个碱基进行测序。当时人们认为,群体遗传学正在进入一个新时代,尽管哈德森警告说,足够大的数据集对于他的新估计器来说“可能需要大量的研究努力”。Hudson的估计值是基于单倍型对之间的位点差异数的方差和缩放突变率h=4Nm的估计值。Brown等人(1980)已经证明,差异数的方差是一组基因座中所有成对连锁不平衡的方便的单统计汇总。对这种统计的需求仍在继续,因为对于双位点关联如何很好地捕获完整的多位点结构仍然存在疑问。Hudson提供了他的统计值期望值作为未知值c的函数的优雅推导。他的矩量方法估计具有简单的优点,尽管它不会像他(Hudson, 1993)和其他人(例如Kuhner等人,2000;墙,2000;Fearnhead and Donnelly, 2001)发展较晚。似然方法利用数据集中的所有信息,而不仅仅是汇总统计数据中的信息,如果底层进化模型适合于所处理的数据,它将会做得很好。在哈德逊10年后,韦克利继续沿用了哈德逊的方法,但对哈德逊的方法进行了修改,以提高其性能。自1983年以来,人类基因组已被测序,其他几个物种的基因组也已测序。现在有一个针对人类的“1000个基因组”项目(http://www.1000 genomes.org)正在进行中,新的测序技术将使群体遗传学家很快有可能获得大量DNA序列数据样本。1987年,哈德逊希望获得更广泛的DNA序列数据,但他无法预见到中间数据——单核苷酸多态性(snp)的惊人爆炸式增长。人类遗传学家现在正在为数千个人的样本生成100万个SNP图谱。到2002年,Hudson已经制作了一个SNP数据的模拟程序(Hudson, 2002), Li和Stephens(2003)等研究已使用该程序来检测重组率的“热点”。哈德森1987年的论文具有经典论文的特点。它提出了一种新的和简单的方法估计重组率从总体样本,而不是从系谱数据。后来引入了更复杂的方法,包括复合似然法(Hudson, 2001)以及Hellenthal和Stephens(2006)评述的其他方法,但最初的方法在进化研究中仍然有用(例如Meikeljohn等人,2004)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信