Brieuc Lehmann, Hanbin Lee, Luke Anderson-Trocmé, Jerome Kelleher, Gregor Gorjanc, Peter L Ralph
{"title":"On ARGs, pedigrees, and genetic relatedness matrices.","authors":"Brieuc Lehmann, Hanbin Lee, Luke Anderson-Trocmé, Jerome Kelleher, Gregor Gorjanc, Peter L Ralph","doi":"10.1093/genetics/iyaf219","DOIUrl":null,"url":null,"abstract":"<p><p>Genetic relatedness is a central concept in genetics, underpinning studies of population and quantitative genetics in human, animal, and plant settings. It is typically stored as a genetic relatedness matrix (GRM), whose elements are pairwise relatedness values between individuals. This relatedness has been defined in various contexts based on pedigree, genotype, phylogeny, coalescent times, and, recently, ancestral recombination graph (ARG). For some downstream applications, including association studies, using ARG-based GRMs has led to better performance relative to the genotype GRM. However, they present computational challenges due to their inherent quadratic time and space complexity. Here, we first discuss the different definitions of relatedness in a unifying context, making use of the additive model of a quantitative trait to provide a definition of ``branch relatedness'' and the corresponding ``branch GRM''. We explore the relationship between branch relatedness and pedigree relatedness (i.e., kinship) through a case study of French-Canadian individuals that have a known pedigree. Through the tree sequence encoding of an ARG, we then derive an efficient algorithm for computing products between the branch GRM and a general vector, without explicitly forming the branch GRM. This algorithm leverages the sparse encoding of genomes with the tree sequence and hence enables large-scale computations with the branch GRM. We demonstrate the power of this algorithm by developing a randomized principal components algorithm for tree sequences that easily scales to millions of genomes. All algorithms are implemented in the open source tskit Python package. Taken together, this work consolidates the different notions of relatedness as branch relatedness and by leveraging the tree sequence encoding of an ARG it provides efficient algorithms that enable computations with the branch GRM that scale to mega-scale genomic datasets.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":5.1000,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/genetics/iyaf219","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
Abstract
Genetic relatedness is a central concept in genetics, underpinning studies of population and quantitative genetics in human, animal, and plant settings. It is typically stored as a genetic relatedness matrix (GRM), whose elements are pairwise relatedness values between individuals. This relatedness has been defined in various contexts based on pedigree, genotype, phylogeny, coalescent times, and, recently, ancestral recombination graph (ARG). For some downstream applications, including association studies, using ARG-based GRMs has led to better performance relative to the genotype GRM. However, they present computational challenges due to their inherent quadratic time and space complexity. Here, we first discuss the different definitions of relatedness in a unifying context, making use of the additive model of a quantitative trait to provide a definition of ``branch relatedness'' and the corresponding ``branch GRM''. We explore the relationship between branch relatedness and pedigree relatedness (i.e., kinship) through a case study of French-Canadian individuals that have a known pedigree. Through the tree sequence encoding of an ARG, we then derive an efficient algorithm for computing products between the branch GRM and a general vector, without explicitly forming the branch GRM. This algorithm leverages the sparse encoding of genomes with the tree sequence and hence enables large-scale computations with the branch GRM. We demonstrate the power of this algorithm by developing a randomized principal components algorithm for tree sequences that easily scales to millions of genomes. All algorithms are implemented in the open source tskit Python package. Taken together, this work consolidates the different notions of relatedness as branch relatedness and by leveraging the tree sequence encoding of an ARG it provides efficient algorithms that enable computations with the branch GRM that scale to mega-scale genomic datasets.
期刊介绍:
GENETICS is published by the Genetics Society of America, a scholarly society that seeks to deepen our understanding of the living world by advancing our understanding of genetics. Since 1916, GENETICS has published high-quality, original research presenting novel findings bearing on genetics and genomics. The journal publishes empirical studies of organisms ranging from microbes to humans, as well as theoretical work.
While it has an illustrious history, GENETICS has changed along with the communities it serves: it is not your mentor''s journal.
The editors make decisions quickly – in around 30 days – without sacrificing the excellence and scholarship for which the journal has long been known. GENETICS is a peer reviewed, peer-edited journal, with an international reach and increasing visibility and impact. All editorial decisions are made through collaboration of at least two editors who are practicing scientists.
GENETICS is constantly innovating: expanded types of content include Reviews, Commentary (current issues of interest to geneticists), Perspectives (historical), Primers (to introduce primary literature into the classroom), Toolbox Reviews, plus YeastBook, FlyBook, and WormBook (coming spring 2016). For particularly time-sensitive results, we publish Communications. As part of our mission to serve our communities, we''ve published thematic collections, including Genomic Selection, Multiparental Populations, Mouse Collaborative Cross, and the Genetics of Sex.