A model-free method for genealogical inference without phasing and its application for topology weighting.

IF 5.1 3区 生物学 Q2 GENETICS & HEREDITY
Genetics Pub Date : 2025-09-08 DOI:10.1093/genetics/iyaf181
Simon H Martin
{"title":"A model-free method for genealogical inference without phasing and its application for topology weighting.","authors":"Simon H Martin","doi":"10.1093/genetics/iyaf181","DOIUrl":null,"url":null,"abstract":"<p><p>Recent advances in methods to infer and analyse ancestral recombination graphs (ARGs) are providing powerful new insights in evolutionary biology and beyond. Existing inference approaches tend to be designed for use with fully-phased datasets, and some rely on model assumptions about demography and recombination rate. Here I describe a simple model-free approach for genealogical inference along the genome from unphased genotype data called Sequential Tree Inference by Collecting Compatible Sites (sticcs). sticcs applies a heuristic algorithm based on the perfect phylogeny principle to reconstruct a local genealogy at each variant site in the genome, using a 'collecting' procedure to import information from other nearby sites. Using simulations, I show that sticcs is accurate for ARG inference, but only when the sample size is small. However, I also describe how it can be applied for the purpose of topology weighting by 'stacking' tree sequences inferred for multiple subsets of the data, removing the sample size restriction. Topology weights estimated in this way from unphased data tend to be more accurate than those computed with full ARGs inferred from perfectly phased data using several popular tools. The methods presented therefore have promise for analysis of relatedness and introgression in non-model species, including polyploids. The new methods are implemented in two Python packages, sticcs (for ARG inference) and twisst2 (for topology weighting using the stacking procedure), both of which interface with the tskit library for analysis of tree sequence objects.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":5.1000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/genetics/iyaf181","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Recent advances in methods to infer and analyse ancestral recombination graphs (ARGs) are providing powerful new insights in evolutionary biology and beyond. Existing inference approaches tend to be designed for use with fully-phased datasets, and some rely on model assumptions about demography and recombination rate. Here I describe a simple model-free approach for genealogical inference along the genome from unphased genotype data called Sequential Tree Inference by Collecting Compatible Sites (sticcs). sticcs applies a heuristic algorithm based on the perfect phylogeny principle to reconstruct a local genealogy at each variant site in the genome, using a 'collecting' procedure to import information from other nearby sites. Using simulations, I show that sticcs is accurate for ARG inference, but only when the sample size is small. However, I also describe how it can be applied for the purpose of topology weighting by 'stacking' tree sequences inferred for multiple subsets of the data, removing the sample size restriction. Topology weights estimated in this way from unphased data tend to be more accurate than those computed with full ARGs inferred from perfectly phased data using several popular tools. The methods presented therefore have promise for analysis of relatedness and introgression in non-model species, including polyploids. The new methods are implemented in two Python packages, sticcs (for ARG inference) and twisst2 (for topology weighting using the stacking procedure), both of which interface with the tskit library for analysis of tree sequence objects.

无模型无相位谱系推理方法及其在拓扑加权中的应用。
推断和分析祖先重组图(ARGs)方法的最新进展为进化生物学和其他领域提供了强有力的新见解。现有的推理方法往往是为全阶段数据集设计的,有些方法依赖于关于人口统计和重组率的模型假设。在这里,我描述了一种简单的无模型方法,用于从未分阶段的基因型数据中沿着基因组进行谱系推断,称为通过收集兼容位点(sticcs)进行序列树推断。Sticcs应用基于完美系统发育原理的启发式算法来重建基因组中每个变异位点的局部谱系,使用“收集”程序从其他附近位点导入信息。通过模拟,我证明了对于ARG推断,stick是准确的,但只有在样本规模很小的情况下。然而,我还描述了如何通过“堆叠”为数据的多个子集推断的树序列来应用它来实现拓扑加权,从而消除样本量限制。以这种方式从非相位数据估计的拓扑权重往往比使用几种常用工具从完全相位数据推断的完整arg计算的拓扑权重更准确。因此,所提出的方法有希望分析非模式物种,包括多倍体的亲缘性和渐渗性。新方法在两个Python包中实现,sticcs(用于ARG推理)和twisst2(用于使用堆叠过程进行拓扑加权),两者都与tskit库接口,用于分析树序列对象。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Genetics
Genetics GENETICS & HEREDITY-
CiteScore
6.90
自引率
6.10%
发文量
177
审稿时长
1.5 months
期刊介绍: GENETICS is published by the Genetics Society of America, a scholarly society that seeks to deepen our understanding of the living world by advancing our understanding of genetics. Since 1916, GENETICS has published high-quality, original research presenting novel findings bearing on genetics and genomics. The journal publishes empirical studies of organisms ranging from microbes to humans, as well as theoretical work. While it has an illustrious history, GENETICS has changed along with the communities it serves: it is not your mentor''s journal. The editors make decisions quickly – in around 30 days – without sacrificing the excellence and scholarship for which the journal has long been known. GENETICS is a peer reviewed, peer-edited journal, with an international reach and increasing visibility and impact. All editorial decisions are made through collaboration of at least two editors who are practicing scientists. GENETICS is constantly innovating: expanded types of content include Reviews, Commentary (current issues of interest to geneticists), Perspectives (historical), Primers (to introduce primary literature into the classroom), Toolbox Reviews, plus YeastBook, FlyBook, and WormBook (coming spring 2016). For particularly time-sensitive results, we publish Communications. As part of our mission to serve our communities, we''ve published thematic collections, including Genomic Selection, Multiparental Populations, Mouse Collaborative Cross, and the Genetics of Sex.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信