FP-AMR:用于自适应网格细化应用的可重构结构框架

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2019-04-01 DOI:10.1109/FCCM.2019.00040

Tianqi Wang, Tong Geng, Xi Jin, M. Herbordt

{"title":"FP-AMR:用于自适应网格细化应用的可重构结构框架","authors":"Tianqi Wang, Tong Geng, Xi Jin, M. Herbordt","doi":"10.1109/FCCM.2019.00040","DOIUrl":null,"url":null,"abstract":"Adaptive mesh refinement (AMR) is one of the most widely used methods in High Performance Computing accounting a large fraction of all supercomputing cycles. AMR operates by dynamically and adaptively applying computational resources non-uniformly to emphasize regions of the model as a function of their complexity. Because AMR generally uses dynamic and pointer-based data structures, acceleration is challenging, especially in hardware. As far as we are aware there has been no previous work published on accelerating AMR with FPGAs. In this paper, we introduce a reconfigurable fabric framework called FP-AMR. The work is in two parts. In the first FP-AMR offloads the bulk per-timestep computations to the FPGA; analogous systems have previously done this with GPUs. In the second part we show that the rest of the CPU-based tasks–including particle mesh mapping, mesh refinement, and coarsening–can also be mapped efficiently to the FPGA. We have evaluated FP-AMR using the widely used program AMReX and found that a single FPGA outperforms a Xeon E5-2660 CPU server (8 cores) by from 21x -23x depending on problem size and data distribution.","PeriodicalId":116955,"journal":{"name":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"FP-AMR: A Reconfigurable Fabric Framework for Adaptive Mesh Refinement Applications\",\"authors\":\"Tianqi Wang, Tong Geng, Xi Jin, M. Herbordt\",\"doi\":\"10.1109/FCCM.2019.00040\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Adaptive mesh refinement (AMR) is one of the most widely used methods in High Performance Computing accounting a large fraction of all supercomputing cycles. AMR operates by dynamically and adaptively applying computational resources non-uniformly to emphasize regions of the model as a function of their complexity. Because AMR generally uses dynamic and pointer-based data structures, acceleration is challenging, especially in hardware. As far as we are aware there has been no previous work published on accelerating AMR with FPGAs. In this paper, we introduce a reconfigurable fabric framework called FP-AMR. The work is in two parts. In the first FP-AMR offloads the bulk per-timestep computations to the FPGA; analogous systems have previously done this with GPUs. In the second part we show that the rest of the CPU-based tasks–including particle mesh mapping, mesh refinement, and coarsening–can also be mapped efficiently to the FPGA. We have evaluated FP-AMR using the widely used program AMReX and found that a single FPGA outperforms a Xeon E5-2660 CPU server (8 cores) by from 21x -23x depending on problem size and data distribution.\",\"PeriodicalId\":116955,\"journal\":{\"name\":\"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FCCM.2019.00040\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2019.00040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

自适应网格细化(AMR)是高性能计算中应用最广泛的方法之一，占所有超级计算周期的很大一部分。AMR通过动态地、自适应地不均匀地应用计算资源来强调模型的区域作为其复杂性的函数。由于AMR通常使用动态和基于指针的数据结构，因此加速是一项挑战，尤其是在硬件方面。据我们所知，以前没有发表过关于fpga加速AMR的工作。本文介绍了一种可重构的结构框架FP-AMR。这项工作分为两部分。在第一个中，FP-AMR将每时间步的批量计算卸载到FPGA;类似的系统以前已经用gpu做到了这一点。在第二部分中，我们展示了其余的基于cpu的任务——包括粒子网格映射、网格细化和粗化——也可以有效地映射到FPGA。我们使用广泛使用的程序AMReX评估了FP-AMR，发现单个FPGA比Xeon E5-2660 CPU服务器(8核)的性能高出21倍-23倍，具体取决于问题大小和数据分布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

FP-AMR: A Reconfigurable Fabric Framework for Adaptive Mesh Refinement Applications

Adaptive mesh refinement (AMR) is one of the most widely used methods in High Performance Computing accounting a large fraction of all supercomputing cycles. AMR operates by dynamically and adaptively applying computational resources non-uniformly to emphasize regions of the model as a function of their complexity. Because AMR generally uses dynamic and pointer-based data structures, acceleration is challenging, especially in hardware. As far as we are aware there has been no previous work published on accelerating AMR with FPGAs. In this paper, we introduce a reconfigurable fabric framework called FP-AMR. The work is in two parts. In the first FP-AMR offloads the bulk per-timestep computations to the FPGA; analogous systems have previously done this with GPUs. In the second part we show that the rest of the CPU-based tasks–including particle mesh mapping, mesh refinement, and coarsening–can also be mapped efficiently to the FPGA. We have evaluated FP-AMR using the widely used program AMReX and found that a single FPGA outperforms a Xeon E5-2660 CPU server (8 cores) by from 21x -23x depending on problem size and data distribution.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

自引率

0.00%

发文量