极大似然下的大树推理

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI:10.1109/IPDPSW.2012.309

F. Izquierdo-Carrasco, A. Stamatakis

{"title":"极大似然下的大树推理","authors":"F. Izquierdo-Carrasco, A. Stamatakis","doi":"10.1109/IPDPSW.2012.309","DOIUrl":null,"url":null,"abstract":"The wide adoption of Next-Generation Sequencing technologies in recent years has generated an avalanche of genetic data, which poses new challenges for large-scale maximum likelihood-based phylogenetic analyses. Improving the scalability of search algorithms and reducing the high memory requirements for computing the likelihood represent major computational challenges in this context. We have introduced methods for solving these key problems and provided respective proof-of-concept implementations. Moreover, we have developed a new tree search strategy that can reduce run times by more than 50% while yielding equally good trees (in the statistical sense). To reduce memory requirements, we explored the applicability of external memory (out-of-core) algorithms as well as a concept that trades memory for additional computations in the likelihood function. The latter concept, only induces a surprisingly small increase in overall execution times. When trading 50% of the required RAM for additional computations, the average execution time increase- because of additional computations-amounts to only 15%. All concepts presented here are sufficiently generic such that they can be applied to all programs that rely on the phylogenetic likelihood function. Thereby, the approaches we have developed will contribute to enable large-scale inferences of whole-genome phylogenies.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"71 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Inference of Huge Trees under Maximum Likelihood\",\"authors\":\"F. Izquierdo-Carrasco, A. Stamatakis\",\"doi\":\"10.1109/IPDPSW.2012.309\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The wide adoption of Next-Generation Sequencing technologies in recent years has generated an avalanche of genetic data, which poses new challenges for large-scale maximum likelihood-based phylogenetic analyses. Improving the scalability of search algorithms and reducing the high memory requirements for computing the likelihood represent major computational challenges in this context. We have introduced methods for solving these key problems and provided respective proof-of-concept implementations. Moreover, we have developed a new tree search strategy that can reduce run times by more than 50% while yielding equally good trees (in the statistical sense). To reduce memory requirements, we explored the applicability of external memory (out-of-core) algorithms as well as a concept that trades memory for additional computations in the likelihood function. The latter concept, only induces a surprisingly small increase in overall execution times. When trading 50% of the required RAM for additional computations, the average execution time increase- because of additional computations-amounts to only 15%. All concepts presented here are sufficiently generic such that they can be applied to all programs that rely on the phylogenetic likelihood function. Thereby, the approaches we have developed will contribute to enable large-scale inferences of whole-genome phylogenies.\",\"PeriodicalId\":378335,\"journal\":{\"name\":\"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum\",\"volume\":\"71 6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-05-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2012.309\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2012.309","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

近年来，新一代测序技术的广泛应用产生了大量的遗传数据，这对基于最大似然的大规模系统发育分析提出了新的挑战。在这种情况下，提高搜索算法的可伸缩性和降低计算可能性的高内存需求是主要的计算挑战。我们介绍了解决这些关键问题的方法，并提供了各自的概念验证实现。此外，我们开发了一种新的树搜索策略，可以减少50%以上的运行时间，同时产生同样好的树(在统计意义上)。为了减少内存需求，我们探索了外部内存(外核)算法的适用性，以及在似然函数中使用内存进行额外计算的概念。后一种概念只会在总体执行时间上带来令人惊讶的小幅度增长。当将所需RAM的50%用于额外的计算时，由于额外的计算而增加的平均执行时间仅为15%。这里提出的所有概念都具有足够的通用性，因此它们可以应用于依赖系统发育似然函数的所有程序。因此，我们开发的方法将有助于实现全基因组系统发育的大规模推断。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Inference of Huge Trees under Maximum Likelihood

The wide adoption of Next-Generation Sequencing technologies in recent years has generated an avalanche of genetic data, which poses new challenges for large-scale maximum likelihood-based phylogenetic analyses. Improving the scalability of search algorithms and reducing the high memory requirements for computing the likelihood represent major computational challenges in this context. We have introduced methods for solving these key problems and provided respective proof-of-concept implementations. Moreover, we have developed a new tree search strategy that can reduce run times by more than 50% while yielding equally good trees (in the statistical sense). To reduce memory requirements, we explored the applicability of external memory (out-of-core) algorithms as well as a concept that trades memory for additional computations in the likelihood function. The latter concept, only induces a surprisingly small increase in overall execution times. When trading 50% of the required RAM for additional computations, the average execution time increase- because of additional computations-amounts to only 15%. All concepts presented here are sufficiently generic such that they can be applied to all programs that rely on the phylogenetic likelihood function. Thereby, the approaches we have developed will contribute to enable large-scale inferences of whole-genome phylogenies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum

自引率

0.00%

发文量