{"title":"Lower Bounds on the Sample Complexity of Species Tree Estimation when Substitution Rates Vary Across Loci.","authors":"Max Hill, Sebastien Roch","doi":"10.1007/s11538-025-01533-y","DOIUrl":null,"url":null,"abstract":"<p><p>In this paper we analyze the effect of substitution rate heterogeneity on the sample complexity of species tree estimation. We consider a model based on the multi-species coalescent (MSC), with the addition that gene trees exhibit random i.i.d. rates of substitution. Our first result is a lower bound on the number of loci needed to distinguish 2-leaf trees (i.e., pairwise distances) with high probability, when substitution rates satisfy a growth condition. In particular, we show that to distinguish two distances differing by length f with high probability, one requires <math><mrow><mi>Ω</mi> <mo>(</mo> <msup><mi>f</mi> <mrow><mo>-</mo> <mn>2</mn></mrow> </msup> <mo>)</mo></mrow> </math> loci, a significantly higher bound than the constant rate case. The second main result is a lower bound on the amount of data needed to reconstruct a 3-leaf species tree with high probability, when mutation rates are gamma distributed. In this case as well, we show that the number of gene trees must grow as <math><mrow><mi>Ω</mi> <mo>(</mo> <msup><mi>f</mi> <mrow><mo>-</mo> <mn>2</mn></mrow> </msup> <mo>)</mo></mrow> </math> .</p>","PeriodicalId":9372,"journal":{"name":"Bulletin of Mathematical Biology","volume":"87 11","pages":"152"},"PeriodicalIF":2.2000,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bulletin of Mathematical Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s11538-025-01533-y","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper we analyze the effect of substitution rate heterogeneity on the sample complexity of species tree estimation. We consider a model based on the multi-species coalescent (MSC), with the addition that gene trees exhibit random i.i.d. rates of substitution. Our first result is a lower bound on the number of loci needed to distinguish 2-leaf trees (i.e., pairwise distances) with high probability, when substitution rates satisfy a growth condition. In particular, we show that to distinguish two distances differing by length f with high probability, one requires loci, a significantly higher bound than the constant rate case. The second main result is a lower bound on the amount of data needed to reconstruct a 3-leaf species tree with high probability, when mutation rates are gamma distributed. In this case as well, we show that the number of gene trees must grow as .
期刊介绍:
The Bulletin of Mathematical Biology, the official journal of the Society for Mathematical Biology, disseminates original research findings and other information relevant to the interface of biology and the mathematical sciences. Contributions should have relevance to both fields. In order to accommodate the broad scope of new developments, the journal accepts a variety of contributions, including:
Original research articles focused on new biological insights gained with the help of tools from the mathematical sciences or new mathematical tools and methods with demonstrated applicability to biological investigations
Research in mathematical biology education
Reviews
Commentaries
Perspectives, and contributions that discuss issues important to the profession
All contributions are peer-reviewed.