Qi-Xin Zhang, Dovini Jayasinghe, Zhe Zhang, Sang Hong Lee, Hai-Ming Xu, Guo-Bo Chen
{"title":"Precise estimation of in-depth relatedness in biobank-scale datasets using deepKin.","authors":"Qi-Xin Zhang, Dovini Jayasinghe, Zhe Zhang, Sang Hong Lee, Hai-Ming Xu, Guo-Bo Chen","doi":"10.1016/j.crmeth.2025.101053","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate relatedness estimation is essential in biobank-scale genetic studies. We present deepKin, a method-of-moments framework that accounts for sampling variance to enable statistical inference and classification of relatedness. Unlike traditional methods using fixed thresholds, deepKin computes data-specific significance thresholds, determines the minimum effective number of markers, and estimates the statistical power to detect distant relatives. Through simulations, we demonstrate that deepKin accurately infers both unrelated pairs and relatives by leveraging sampling variance. In the UK Biobank (UKB), analysis of the 3K Oxford subset showed that SNP sets with a larger effective number of markers provided greater power for detecting distant relatives. In the White British subset, deepKin identified over 212,000 significant relative pairs, categorized into six degrees, and revealed their geographic patterns across 19 UKB assessment centers through within-cohort and cross-cohort relatedness estimation. An R package (deepKin) is available at GitHub.</p>","PeriodicalId":29773,"journal":{"name":"Cell Reports Methods","volume":" ","pages":"101053"},"PeriodicalIF":4.3000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cell Reports Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.crmeth.2025.101053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/27 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate relatedness estimation is essential in biobank-scale genetic studies. We present deepKin, a method-of-moments framework that accounts for sampling variance to enable statistical inference and classification of relatedness. Unlike traditional methods using fixed thresholds, deepKin computes data-specific significance thresholds, determines the minimum effective number of markers, and estimates the statistical power to detect distant relatives. Through simulations, we demonstrate that deepKin accurately infers both unrelated pairs and relatives by leveraging sampling variance. In the UK Biobank (UKB), analysis of the 3K Oxford subset showed that SNP sets with a larger effective number of markers provided greater power for detecting distant relatives. In the White British subset, deepKin identified over 212,000 significant relative pairs, categorized into six degrees, and revealed their geographic patterns across 19 UKB assessment centers through within-cohort and cross-cohort relatedness estimation. An R package (deepKin) is available at GitHub.