{"title":"基于似然系统发育定位的高效内存管理","authors":"P. Barbera, A. Stamatakis","doi":"10.1109/IPDPSW52791.2021.00041","DOIUrl":null,"url":null,"abstract":"Maximum likelihood based phylogenetic methods score phylogenetic tree topologies comprising a set of molecular sequences of the species under study, using statistical models of evolution. The scoring procedure relies on storing intermediate results at inner nodes of the tree during the tree traversal. This induces comparatively high memory requirements compared to less compute-intensive methods such as parsimony, for instance.The memory requirements are particularly large for maximum likelihood phylogenetic placement, as further intermediate results should be stored at all branches of the tree to maximize runtime performance. This has hindered numerous users of our phylogenetic placement tool EPA-NG from performing placement on large phylogenetic trees.Here, we present an approach to reduce the memory footprint of EPA-NG. Further, we have generalized our implementation and integrated it into our phylogenetic likelihood library, libpll-2, such that it can be used by other tools for phylogenetic inference. On an empirical dataset, we were able to reduce the memory requirements by up to 96% at the cost of increasing execution times by 23 times. Hence, there exists a trade-off between decreasing memory requirements and increasing execution times which we investigate. When increasing the amount of memory available for placement to a certain level, execution times are only approximately 4 times lower for the most challenging dataset we have tested. This now allows for conducting maximum likelihood based placement on substantially larger trees within reasonable times. Finally, we show that the active memory management approach introduces new challenges for parallelization and outline possible solutions.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient Memory Management in Likelihood-based Phylogenetic Placement\",\"authors\":\"P. Barbera, A. Stamatakis\",\"doi\":\"10.1109/IPDPSW52791.2021.00041\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Maximum likelihood based phylogenetic methods score phylogenetic tree topologies comprising a set of molecular sequences of the species under study, using statistical models of evolution. The scoring procedure relies on storing intermediate results at inner nodes of the tree during the tree traversal. This induces comparatively high memory requirements compared to less compute-intensive methods such as parsimony, for instance.The memory requirements are particularly large for maximum likelihood phylogenetic placement, as further intermediate results should be stored at all branches of the tree to maximize runtime performance. This has hindered numerous users of our phylogenetic placement tool EPA-NG from performing placement on large phylogenetic trees.Here, we present an approach to reduce the memory footprint of EPA-NG. Further, we have generalized our implementation and integrated it into our phylogenetic likelihood library, libpll-2, such that it can be used by other tools for phylogenetic inference. On an empirical dataset, we were able to reduce the memory requirements by up to 96% at the cost of increasing execution times by 23 times. Hence, there exists a trade-off between decreasing memory requirements and increasing execution times which we investigate. When increasing the amount of memory available for placement to a certain level, execution times are only approximately 4 times lower for the most challenging dataset we have tested. This now allows for conducting maximum likelihood based placement on substantially larger trees within reasonable times. Finally, we show that the active memory management approach introduces new challenges for parallelization and outline possible solutions.\",\"PeriodicalId\":170832,\"journal\":{\"name\":\"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW52791.2021.00041\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW52791.2021.00041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficient Memory Management in Likelihood-based Phylogenetic Placement
Maximum likelihood based phylogenetic methods score phylogenetic tree topologies comprising a set of molecular sequences of the species under study, using statistical models of evolution. The scoring procedure relies on storing intermediate results at inner nodes of the tree during the tree traversal. This induces comparatively high memory requirements compared to less compute-intensive methods such as parsimony, for instance.The memory requirements are particularly large for maximum likelihood phylogenetic placement, as further intermediate results should be stored at all branches of the tree to maximize runtime performance. This has hindered numerous users of our phylogenetic placement tool EPA-NG from performing placement on large phylogenetic trees.Here, we present an approach to reduce the memory footprint of EPA-NG. Further, we have generalized our implementation and integrated it into our phylogenetic likelihood library, libpll-2, such that it can be used by other tools for phylogenetic inference. On an empirical dataset, we were able to reduce the memory requirements by up to 96% at the cost of increasing execution times by 23 times. Hence, there exists a trade-off between decreasing memory requirements and increasing execution times which we investigate. When increasing the amount of memory available for placement to a certain level, execution times are only approximately 4 times lower for the most challenging dataset we have tested. This now allows for conducting maximum likelihood based placement on substantially larger trees within reasonable times. Finally, we show that the active memory management approach introduces new challenges for parallelization and outline possible solutions.