Unexpected Diversity: Quantitative Memory Analysis for Zynq UltraScale+ Systems

Kristiyan Manev, Anuj Vaishnav, Dirk Koch
{"title":"Unexpected Diversity: Quantitative Memory Analysis for Zynq UltraScale+ Systems","authors":"Kristiyan Manev, Anuj Vaishnav, Dirk Koch","doi":"10.1109/ICFPT47387.2019.00029","DOIUrl":null,"url":null,"abstract":"Memory throughput is one of the major bottlenecks for accelerator performance. Now that Zynq UltraScale+ systems are being deployed at exascale to edge, it is important to understand their characteristics of the memory subsystem and optimizations possible for developers. In this paper, we extensively evaluate the memory performance and behaviour for various AXI port combinations, burst sizes, access patterns, and the number of accelerators per AXI port. Our results on ZCU102 and Ultra 96 boards show that 1) effective throughput of these systems is reaching only 75% and 92.5% of theoretical maximum respectively, 2) 128 and 192 Byte burst size is often optimal, 3) AXI ports of the same type may not always exhibit similar behaviour, 4) multiplexing accelerators in PL can provide better throughput distribution compared to multiplexing in PS, and 5) using all AXI ports does not lead to the highest performance.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT47387.2019.00029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

Memory throughput is one of the major bottlenecks for accelerator performance. Now that Zynq UltraScale+ systems are being deployed at exascale to edge, it is important to understand their characteristics of the memory subsystem and optimizations possible for developers. In this paper, we extensively evaluate the memory performance and behaviour for various AXI port combinations, burst sizes, access patterns, and the number of accelerators per AXI port. Our results on ZCU102 and Ultra 96 boards show that 1) effective throughput of these systems is reaching only 75% and 92.5% of theoretical maximum respectively, 2) 128 and 192 Byte burst size is often optimal, 3) AXI ports of the same type may not always exhibit similar behaviour, 4) multiplexing accelerators in PL can provide better throughput distribution compared to multiplexing in PS, and 5) using all AXI ports does not lead to the highest performance.
意想不到的多样性:Zynq UltraScale+系统的定量内存分析
内存吞吐量是加速器性能的主要瓶颈之一。现在,Zynq UltraScale+系统正在部署在exascale到edge上,了解它们的内存子系统特征和开发人员可能进行的优化是很重要的。在本文中,我们广泛地评估了各种AXI端口组合,突发大小,访问模式和每个AXI端口加速器数量的内存性能和行为。我们在ZCU102和Ultra 96板上的结果表明,1)这些系统的有效吞吐量分别仅达到理论最大值的75%和92.5%,2)128和192字节的爆发大小通常是最优的,3)相同类型的AXI端口可能并不总是表现出相似的行为,4)与PS中的多路复用相比,PL中的多路复用加速器可以提供更好的吞吐量分配,并且5)使用所有AXI端口不会导致最高性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信