Scalable Phylogeny Reconstruction with Disaggregated Near-memory Processing

ACM Transactions on Reconfigurable Technology and Systems (TRETS) Pub Date : 2021-12-28 DOI:10.1145/3484983

Nikolaos S. Alachiotis, P. Skrimponis, Manolis Pissadakis, D. Pnevmatikatos

{"title":"Scalable Phylogeny Reconstruction with Disaggregated Near-memory Processing","authors":"Nikolaos S. Alachiotis, P. Skrimponis, Manolis Pissadakis, D. Pnevmatikatos","doi":"10.1145/3484983","DOIUrl":null,"url":null,"abstract":"Disaggregated computer architectures eliminate resource fragmentation in next-generation datacenters by enabling virtual machines to employ resources such as CPUs, memory, and accelerators that are physically located on different servers. While this paves the way for highly compute- and/or memory-intensive applications to potentially deploy all CPUs and/or memory resources in a datacenter, it poses a major challenge to the efficient deployment of hardware accelerators: input/output data can reside on different servers than the ones hosting accelerator resources, thereby requiring time- and energy-consuming remote data transfers that diminish the gains of hardware acceleration. Targeting a disaggregated datacenter architecture similar to the IBM dReDBox disaggregated datacenter prototype, the present work explores the potential of deploying custom acceleration units adjacently to the disaggregated-memory controller on memory bricks (in dReDBox terminology), which is implemented on FPGA technology, to reduce data movement and improve performance and energy efficiency when reconstructing large phylogenies (evolutionary relationships among organisms). A fundamental computational kernel is the Phylogenetic Likelihood Function (PLF), which dominates the total execution time (up to 95%) of widely used maximum-likelihood methods. Numerous efforts to boost PLF performance over the years focused on accelerating computation; since the PLF is a data-intensive, memory-bound operation, performance remains limited by data movement, and memory disaggregation only exacerbates the problem. We describe two near-memory processing models, one that addresses the problem of workload distribution to memory bricks, which is particularly tailored toward larger genomes (e.g., plants and mammals), and one that reduces overall memory requirements through memory-side data interpolation transparently to the application, thereby allowing the phylogeny size to scale to a larger number of organisms without requiring additional memory.","PeriodicalId":162787,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3484983","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Disaggregated computer architectures eliminate resource fragmentation in next-generation datacenters by enabling virtual machines to employ resources such as CPUs, memory, and accelerators that are physically located on different servers. While this paves the way for highly compute- and/or memory-intensive applications to potentially deploy all CPUs and/or memory resources in a datacenter, it poses a major challenge to the efficient deployment of hardware accelerators: input/output data can reside on different servers than the ones hosting accelerator resources, thereby requiring time- and energy-consuming remote data transfers that diminish the gains of hardware acceleration. Targeting a disaggregated datacenter architecture similar to the IBM dReDBox disaggregated datacenter prototype, the present work explores the potential of deploying custom acceleration units adjacently to the disaggregated-memory controller on memory bricks (in dReDBox terminology), which is implemented on FPGA technology, to reduce data movement and improve performance and energy efficiency when reconstructing large phylogenies (evolutionary relationships among organisms). A fundamental computational kernel is the Phylogenetic Likelihood Function (PLF), which dominates the total execution time (up to 95%) of widely used maximum-likelihood methods. Numerous efforts to boost PLF performance over the years focused on accelerating computation; since the PLF is a data-intensive, memory-bound operation, performance remains limited by data movement, and memory disaggregation only exacerbates the problem. We describe two near-memory processing models, one that addresses the problem of workload distribution to memory bricks, which is particularly tailored toward larger genomes (e.g., plants and mammals), and one that reduces overall memory requirements through memory-side data interpolation transparently to the application, thereby allowing the phylogeny size to scale to a larger number of organisms without requiring additional memory.

查看原文本刊更多论文

基于分解近记忆处理的可扩展系统发育重建

分解的计算机体系结构使虚拟机能够使用物理上位于不同服务器上的cpu、内存和加速器等资源，从而消除了下一代数据中心中的资源碎片。虽然这为高度计算和/或内存密集型应用程序在数据中心部署所有cpu和/或内存资源铺平了道路，但它对硬件加速器的有效部署提出了重大挑战:输入/输出数据可能驻留在不同的服务器上，而不是驻留在托管加速器资源的服务器上，因此需要耗费时间和能量的远程数据传输，从而减少了硬件加速的收益。针对类似于IBM dReDBox分解数据中心原型的分解数据中心架构，目前的工作探索了在FPGA技术上实现的内存块(在dReDBox术语中)上部署与分解内存控制器相邻的定制加速单元的潜力，以减少数据移动并提高重构大型系统发生(生物体之间的进化关系)时的性能和能源效率。一个基本的计算核是系统发生似然函数(PLF)，它支配着广泛使用的最大似然方法的总执行时间(高达95%)。多年来，许多提高PLF性能的努力都集中在加速计算上;由于PLF是一个数据密集型、内存受限的操作，因此性能仍然受到数据移动的限制，而内存分解只会加剧这个问题。我们描述了两个近内存处理模型，一个解决了工作负载分配到内存块的问题，这是专门为更大的基因组(例如，植物和哺乳动物)量身定制的，另一个通过对应用程序透明的内存端数据插值减少了总体内存需求，从而允许系统发育大小扩展到更多数量的生物体，而不需要额外的内存。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Reconfigurable Technology and Systems (TRETS)

自引率

0.00%

发文量