在无偏误差和缺失模型下，四重奏使细胞谱系树的统计一致估计成为可能。

IF 1.5 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Algorithms for Molecular Biology Pub Date : 2023-12-01 DOI:10.1186/s13015-023-00248-w

Yunheng Han, Erin K Molloy

{"title":"在无偏误差和缺失模型下，四重奏使细胞谱系树的统计一致估计成为可能。","authors":"Yunheng Han, Erin K Molloy","doi":"10.1186/s13015-023-00248-w","DOIUrl":null,"url":null,"abstract":"Cancer progression and treatment can be informed by reconstructing its evolutionary history from tumor cells. Although many methods exist to estimate evolutionary trees (called phylogenies) from molecular sequences, traditional approaches assume the input data are error-free and the output tree is fully resolved. These assumptions are challenged in tumor phylogenetics because single-cell sequencing produces sparse, error-ridden data and because tumors evolve clonally. Here, we study the theoretical utility of methods based on quartets (four-leaf, unrooted phylogenetic trees) in light of these barriers. We consider a popular tumor phylogenetics model, in which mutations arise on a (highly unresolved) tree and then (unbiased) errors and missing values are introduced. Quartets are then implied by mutations present in two cells and absent from two cells. Our main result is that the most probable quartet identifies the unrooted model tree on four cells. This motivates seeking a tree such that the number of quartets shared between it and the input mutations is maximized. We prove an optimal solution to this problem is a consistent estimator of the unrooted cell lineage tree; this guarantee includes the case where the model tree is highly unresolved, with error defined as the number of false negative branches. Lastly, we outline how quartet-based methods might be employed when there are copy number aberrations and other challenges specific to tumor phylogenetics.","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"18 1","pages":"19"},"PeriodicalIF":1.5000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10691101/pdf/","citationCount":"0","resultStr":"{\"title\":\"Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model.\",\"authors\":\"Yunheng Han, Erin K Molloy\",\"doi\":\"10.1186/s13015-023-00248-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cancer progression and treatment can be informed by reconstructing its evolutionary history from tumor cells. Although many methods exist to estimate evolutionary trees (called phylogenies) from molecular sequences, traditional approaches assume the input data are error-free and the output tree is fully resolved. These assumptions are challenged in tumor phylogenetics because single-cell sequencing produces sparse, error-ridden data and because tumors evolve clonally. Here, we study the theoretical utility of methods based on quartets (four-leaf, unrooted phylogenetic trees) in light of these barriers. We consider a popular tumor phylogenetics model, in which mutations arise on a (highly unresolved) tree and then (unbiased) errors and missing values are introduced. Quartets are then implied by mutations present in two cells and absent from two cells. Our main result is that the most probable quartet identifies the unrooted model tree on four cells. This motivates seeking a tree such that the number of quartets shared between it and the input mutations is maximized. We prove an optimal solution to this problem is a consistent estimator of the unrooted cell lineage tree; this guarantee includes the case where the model tree is highly unresolved, with error defined as the number of false negative branches. Lastly, we outline how quartet-based methods might be employed when there are copy number aberrations and other challenges specific to tumor phylogenetics.\",\"PeriodicalId\":50823,\"journal\":{\"name\":\"Algorithms for Molecular Biology\",\"volume\":\"18 1\",\"pages\":\"19\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2023-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10691101/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Algorithms for Molecular Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s13015-023-00248-w\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms for Molecular Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13015-023-00248-w","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

通过重建肿瘤细胞的进化历史，可以了解癌症的进展和治疗。虽然有许多方法可以从分子序列中估计进化树(称为系统发生)，但传统的方法假设输入数据是无错误的，输出树是完全解析的。这些假设在肿瘤系统发育学中受到了挑战，因为单细胞测序产生的数据稀疏且充满错误，而且肿瘤是克隆进化的。在此，针对这些障碍，我们研究了基于四叶无根系统发育树的方法的理论效用。我们考虑一种流行的肿瘤系统发育模型，其中突变出现在(高度未解决的)树上，然后引入(无偏)误差和缺失值。四重奏是由两个细胞中存在的突变和两个细胞中不存在的突变所暗示的。我们的主要结果是最可能的四重奏识别了四个单元格上的无根模型树。这促使人们寻找这样一棵树，使它与输入突变之间共享的四元数最大化。证明了该问题的最优解是无根细胞谱系树的一致估计;这种保证包括模型树高度未解析的情况，错误定义为假阴性分支的数量。最后，我们概述了当存在拷贝数畸变和肿瘤系统发育特有的其他挑战时，如何采用基于四分体的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model.

Cancer progression and treatment can be informed by reconstructing its evolutionary history from tumor cells. Although many methods exist to estimate evolutionary trees (called phylogenies) from molecular sequences, traditional approaches assume the input data are error-free and the output tree is fully resolved. These assumptions are challenged in tumor phylogenetics because single-cell sequencing produces sparse, error-ridden data and because tumors evolve clonally. Here, we study the theoretical utility of methods based on quartets (four-leaf, unrooted phylogenetic trees) in light of these barriers. We consider a popular tumor phylogenetics model, in which mutations arise on a (highly unresolved) tree and then (unbiased) errors and missing values are introduced. Quartets are then implied by mutations present in two cells and absent from two cells. Our main result is that the most probable quartet identifies the unrooted model tree on four cells. This motivates seeking a tree such that the number of quartets shared between it and the input mutations is maximized. We prove an optimal solution to this problem is a consistent estimator of the unrooted cell lineage tree; this guarantee includes the case where the model tree is highly unresolved, with error defined as the number of false negative branches. Lastly, we outline how quartet-based methods might be employed when there are copy number aberrations and other challenges specific to tumor phylogenetics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Algorithms for Molecular Biology 生物-生化研究方法

CiteScore

2.40

自引率

10.00%

发文量

审稿时长

>12 weeks

期刊介绍： Algorithms for Molecular Biology publishes articles on novel algorithms for biological sequence and structure analysis, phylogeny reconstruction, and combinatorial algorithms and machine learning. Areas of interest include but are not limited to: algorithms for RNA and protein structure analysis, gene prediction and genome analysis, comparative sequence analysis and alignment, phylogeny, gene expression, machine learning, and combinatorial algorithms. Where appropriate, manuscripts should describe applications to real-world data. However, pure algorithm papers are also welcome if future applications to biological data are to be expected, or if they address complexity or approximation issues of novel computational problems in molecular biology. Articles about novel software tools will be considered for publication if they contain some algorithmically interesting aspects.