A Memory-Efficient Implementation of a Plasmonics Simulation Application on SX-ACE

R. Mathur, Hiroshi Matsuoka, Osamu Watanabe, A. Musa, Ryusuke Egawa, Hiroaki Kobayashi
{"title":"A Memory-Efficient Implementation of a Plasmonics Simulation Application on SX-ACE","authors":"R. Mathur, Hiroshi Matsuoka, Osamu Watanabe, A. Musa, Ryusuke Egawa, Hiroaki Kobayashi","doi":"10.15803/IJNC.6.2_243","DOIUrl":null,"url":null,"abstract":"Since recent scientific and engineering simulations require heavy computations with large volumes of data, High-performance Computing (HPC) systems need a high computational capability with a large memory capacity. Most recent HPC systems adopt a parallel processing architecture, where the computational capability of the processors is increasing, however, the performance of the memory system is constrained. The bytes per flop (B/F), which is a ratio of the memory bandwidth to the flop/s, for the HPC systems have been reduced with the evolution of the HPC systems. To fully exploit the potential of the recent HPC systems, and to meet the increasing demand for large memory, it is necessary to optimize practical scientific and engineering applications, considering not only the parallelism of the applications, but also the limitations of the memory subsystems of the HPC systems. In this paper, we discuss a set of approaches to optimization of the memory access behavior of the applications, which enable their executions with improved performance on the recent HPC systems. Our approaches include memory optimizations through memory footprint controlling, restructuring of data structures for active elements, redundant data structure elimination through combined calculations and optimized re-calculation of data. To validate the effectiveness of our approaches, a plasmonics simulation application is evaluated on vector platforms NEC SX-ACE, NEC SX-9, and Intel Xeon based platform NEC LX 406-Re2. By applying our approaches to the implementation, the memory usage of the plasmonics simulation application can be reduced up to nearly 1/71 of the original, and its execution can be possible on a single node of a distributed parallel system with smaller memory capacity. The optimization results in 1.14 times faster execution on SX-ACE and 1.81 times faster execution on LX 406-Re2.","PeriodicalId":270166,"journal":{"name":"Int. J. Netw. Comput.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Netw. Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15803/IJNC.6.2_243","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Since recent scientific and engineering simulations require heavy computations with large volumes of data, High-performance Computing (HPC) systems need a high computational capability with a large memory capacity. Most recent HPC systems adopt a parallel processing architecture, where the computational capability of the processors is increasing, however, the performance of the memory system is constrained. The bytes per flop (B/F), which is a ratio of the memory bandwidth to the flop/s, for the HPC systems have been reduced with the evolution of the HPC systems. To fully exploit the potential of the recent HPC systems, and to meet the increasing demand for large memory, it is necessary to optimize practical scientific and engineering applications, considering not only the parallelism of the applications, but also the limitations of the memory subsystems of the HPC systems. In this paper, we discuss a set of approaches to optimization of the memory access behavior of the applications, which enable their executions with improved performance on the recent HPC systems. Our approaches include memory optimizations through memory footprint controlling, restructuring of data structures for active elements, redundant data structure elimination through combined calculations and optimized re-calculation of data. To validate the effectiveness of our approaches, a plasmonics simulation application is evaluated on vector platforms NEC SX-ACE, NEC SX-9, and Intel Xeon based platform NEC LX 406-Re2. By applying our approaches to the implementation, the memory usage of the plasmonics simulation application can be reduced up to nearly 1/71 of the original, and its execution can be possible on a single node of a distributed parallel system with smaller memory capacity. The optimization results in 1.14 times faster execution on SX-ACE and 1.81 times faster execution on LX 406-Re2.
SX-ACE上等离子体仿真应用的内存高效实现
由于目前的科学和工程模拟需要大量的数据计算,高性能计算(HPC)系统需要高计算能力和大内存容量。最新的高性能计算系统采用并行处理架构,其中处理器的计算能力不断提高,然而,内存系统的性能受到限制。随着HPC系统的发展,HPC系统的字节per flop (B/F),即内存带宽与flop/s的比率已经降低。为了充分发挥高性能计算系统的潜力,满足日益增长的大内存需求,必须对实际科学和工程应用进行优化,既要考虑应用的并行性,又要考虑高性能计算系统存储子系统的局限性。在本文中,我们讨论了一组优化应用程序的内存访问行为的方法,这些方法使它们在最新的高性能计算系统上的执行具有更高的性能。我们的方法包括通过内存占用控制来优化内存,重组活动元素的数据结构,通过组合计算消除冗余数据结构以及优化数据的重新计算。为了验证我们方法的有效性,我们在矢量平台NEC SX-ACE、NEC SX-9和基于Intel至强的平台NEC LX 406-Re2上评估了等离子体仿真应用程序。通过将我们的方法应用于实现,等离子体仿真应用程序的内存使用可以减少到原始应用程序的近1/71,并且可以在具有较小内存容量的分布式并行系统的单个节点上执行。优化后,SX-ACE的执行速度提高了1.14倍,LX 406-Re2的执行速度提高了1.81倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信