Unexpected Diversity: Quantitative Memory Analysis for Zynq UltraScale+ Systems

2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-10-07 DOI:10.1109/ICFPT47387.2019.00029

Kristiyan Manev, Anuj Vaishnav, Dirk Koch

引用次数: 17

Abstract

Memory throughput is one of the major bottlenecks for accelerator performance. Now that Zynq UltraScale+ systems are being deployed at exascale to edge, it is important to understand their characteristics of the memory subsystem and optimizations possible for developers. In this paper, we extensively evaluate the memory performance and behaviour for various AXI port combinations, burst sizes, access patterns, and the number of accelerators per AXI port. Our results on ZCU102 and Ultra 96 boards show that 1) effective throughput of these systems is reaching only 75% and 92.5% of theoretical maximum respectively, 2) 128 and 192 Byte burst size is often optimal, 3) AXI ports of the same type may not always exhibit similar behaviour, 4) multiplexing accelerators in PL can provide better throughput distribution compared to multiplexing in PS, and 5) using all AXI ports does not lead to the highest performance.

查看原文本刊更多论文

意想不到的多样性:Zynq UltraScale+系统的定量内存分析

内存吞吐量是加速器性能的主要瓶颈之一。现在，Zynq UltraScale+系统正在部署在exascale到edge上，了解它们的内存子系统特征和开发人员可能进行的优化是很重要的。在本文中，我们广泛地评估了各种AXI端口组合，突发大小，访问模式和每个AXI端口加速器数量的内存性能和行为。我们在ZCU102和Ultra 96板上的结果表明，1)这些系统的有效吞吐量分别仅达到理论最大值的75%和92.5%，2)128和192字节的爆发大小通常是最优的，3)相同类型的AXI端口可能并不总是表现出相似的行为，4)与PS中的多路复用相比，PL中的多路复用加速器可以提供更好的吞吐量分配，并且5)使用所有AXI端口不会导致最高性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 International Conference on Field-Programmable Technology (ICFPT)

自引率

0.00%

发文量