分阶段数以百万计的样品实现了近乎完美的准确性,使母体起源分析成为可能。

IF 3.3 Q2 GENETICS & HEREDITY
Cole M Williams, Jared O'Connell, Ethan Jewett, William A Freyman, Christopher R Gignoux, Sohini Ramachandran, Amy L Williams
{"title":"分阶段数以百万计的样品实现了近乎完美的准确性,使母体起源分析成为可能。","authors":"Cole M Williams, Jared O'Connell, Ethan Jewett, William A Freyman, Christopher R Gignoux, Sohini Ramachandran, Amy L Williams","doi":"10.1016/j.xhgg.2025.100479","DOIUrl":null,"url":null,"abstract":"<p><p>Haplotype phasing, the process of determining which genetic variants are physically located on the same chromosome, is crucial for genetic analyses. Here, we benchmark SHAPEIT and Beagle, two state-of-the-art phasing methods, on two large datasets: >8 million research-consented 23andMe, Inc. customers and the UK Biobank (UKB). Remarkably, both methods' median switch error rate (SER) (after excluding single SNP switches, which we call 'blips') is 0.00% across all tested 23andMe trio children and 0.026% in British samples from UKB. Across UKB samples, switch errors predominantly occur in regions lacking identity-by-descent (IBD) coverage. SHAPEIT and Beagle excel at intra-chromosomal phasing, but lack the ability to phase across chromosomes, motivating us to develop HAPTiC (HAPlotype Tiling and Clustering), an inter-chromosomal phasing method that assigns paternal and maternal variants genome-wide. Our approach uses IBD segments to phase blocks of variants on different chromosomes. HAPTiC represents the segments a focal individual shares with their relatives as nodes in a signed graph and performs spectral clustering. We test HAPTiC on 1022 UKB trios, yielding a median per-site phase error of 0.13% in regions covered by IBD segments (45.1% of sites). We also ran HAPTiC in the 23andMe database and found a median phase error rate of 0.49% in Europeans (100% of sites) and 0.16% in admixed Africans (99.8% of sites). HAPTiC enables analyses that require the parent-of-origin of variants, such as association studies and ancestry inference of untyped parents.</p>","PeriodicalId":34530,"journal":{"name":"HGG Advances","volume":" ","pages":"100479"},"PeriodicalIF":3.3000,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Phasing millions of samples achieves near perfect accuracy, enabling parent-of-origin analyses.\",\"authors\":\"Cole M Williams, Jared O'Connell, Ethan Jewett, William A Freyman, Christopher R Gignoux, Sohini Ramachandran, Amy L Williams\",\"doi\":\"10.1016/j.xhgg.2025.100479\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Haplotype phasing, the process of determining which genetic variants are physically located on the same chromosome, is crucial for genetic analyses. Here, we benchmark SHAPEIT and Beagle, two state-of-the-art phasing methods, on two large datasets: >8 million research-consented 23andMe, Inc. customers and the UK Biobank (UKB). Remarkably, both methods' median switch error rate (SER) (after excluding single SNP switches, which we call 'blips') is 0.00% across all tested 23andMe trio children and 0.026% in British samples from UKB. Across UKB samples, switch errors predominantly occur in regions lacking identity-by-descent (IBD) coverage. SHAPEIT and Beagle excel at intra-chromosomal phasing, but lack the ability to phase across chromosomes, motivating us to develop HAPTiC (HAPlotype Tiling and Clustering), an inter-chromosomal phasing method that assigns paternal and maternal variants genome-wide. Our approach uses IBD segments to phase blocks of variants on different chromosomes. HAPTiC represents the segments a focal individual shares with their relatives as nodes in a signed graph and performs spectral clustering. We test HAPTiC on 1022 UKB trios, yielding a median per-site phase error of 0.13% in regions covered by IBD segments (45.1% of sites). We also ran HAPTiC in the 23andMe database and found a median phase error rate of 0.49% in Europeans (100% of sites) and 0.16% in admixed Africans (99.8% of sites). HAPTiC enables analyses that require the parent-of-origin of variants, such as association studies and ancestry inference of untyped parents.</p>\",\"PeriodicalId\":34530,\"journal\":{\"name\":\"HGG Advances\",\"volume\":\" \",\"pages\":\"100479\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"HGG Advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1016/j.xhgg.2025.100479\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"HGG Advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.xhgg.2025.100479","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

摘要

单倍型相位,即确定哪些遗传变异物理上位于同一染色体上的过程,对遗传分析至关重要。在这里,我们对SHAPEIT和Beagle这两种最先进的分阶段方法进行了基准测试,基于两个大型数据集:8800万研究同意的23andMe公司客户和英国生物银行(UKB)。值得注意的是,在所有测试的23andMe三人组儿童中,这两种方法的中位开关错误率(SER)(排除单SNP开关后,我们称之为“小点”)为0.00%,而在来自英国的英国样本中为0.026%。在UKB样本中,开关错误主要发生在缺乏血统识别(IBD)覆盖的地区。SHAPEIT和Beagle擅长染色体内分期,但缺乏跨染色体分期的能力,这促使我们开发了HAPTiC (HAPlotype Tiling and Clustering),这是一种染色体间分期方法,可以在全基因组范围内分配父亲和母亲的变异。我们的方法使用IBD片段来相位不同染色体上的变异块。HAPTiC将焦点个体与其亲属共享的片段表示为符号图中的节点,并执行谱聚类。我们在1022个UKB三联体上测试了HAPTiC,在IBD片段覆盖的区域(45.1%的位点)中,每个位点的相位误差中位数为0.13%。我们还在23andMe数据库中运行了HAPTiC,发现欧洲人(100%的位点)的中位相位错误率为0.49%,混合非洲人(99.8%的位点)的中位相位错误率为0.16%。HAPTiC支持需要变体的父母起源的分析,例如关联研究和无型父母的祖先推断。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Phasing millions of samples achieves near perfect accuracy, enabling parent-of-origin analyses.

Haplotype phasing, the process of determining which genetic variants are physically located on the same chromosome, is crucial for genetic analyses. Here, we benchmark SHAPEIT and Beagle, two state-of-the-art phasing methods, on two large datasets: >8 million research-consented 23andMe, Inc. customers and the UK Biobank (UKB). Remarkably, both methods' median switch error rate (SER) (after excluding single SNP switches, which we call 'blips') is 0.00% across all tested 23andMe trio children and 0.026% in British samples from UKB. Across UKB samples, switch errors predominantly occur in regions lacking identity-by-descent (IBD) coverage. SHAPEIT and Beagle excel at intra-chromosomal phasing, but lack the ability to phase across chromosomes, motivating us to develop HAPTiC (HAPlotype Tiling and Clustering), an inter-chromosomal phasing method that assigns paternal and maternal variants genome-wide. Our approach uses IBD segments to phase blocks of variants on different chromosomes. HAPTiC represents the segments a focal individual shares with their relatives as nodes in a signed graph and performs spectral clustering. We test HAPTiC on 1022 UKB trios, yielding a median per-site phase error of 0.13% in regions covered by IBD segments (45.1% of sites). We also ran HAPTiC in the 23andMe database and found a median phase error rate of 0.49% in Europeans (100% of sites) and 0.16% in admixed Africans (99.8% of sites). HAPTiC enables analyses that require the parent-of-origin of variants, such as association studies and ancestry inference of untyped parents.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
HGG Advances
HGG Advances Biochemistry, Genetics and Molecular Biology-Molecular Medicine
CiteScore
4.30
自引率
4.50%
发文量
69
审稿时长
14 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信