高性能计算系统中的资源利用分析:以NERSC Perlmutter为例

ICT systems security and privacy protection : 32nd IFIP TC 11 International Conference, SEC 2017, Rome, Italy, May 29-31, 2017, Proceedings. IFIP TC11 International Information Security Conference (32nd : 2017 : Rome, Italy) Pub Date : 2023-01-12 DOI:10.48550/arXiv.2301.05145

Jie Li, George Michelogiannakis, B. Cook, Dulanya Cooray, Yong Chen

{"title":"高性能计算系统中的资源利用分析:以NERSC Perlmutter为例","authors":"Jie Li, George Michelogiannakis, B. Cook, Dulanya Cooray, Yong Chen","doi":"10.48550/arXiv.2301.05145","DOIUrl":null,"url":null,"abstract":"Resource demands of HPC applications vary significantly. However, it is common for HPC systems to primarily assign resources on a per-node basis to prevent interference from co-located workloads. This gap between the coarse-grained resource allocation and the varying resource demands can lead to HPC resources being not fully utilized. In this study, we analyze the resource usage and application behavior of NERSC's Perlmutter, a state-of-the-art open-science HPC system with both CPU-only and GPU-accelerated nodes. Our one-month usage analysis reveals that CPUs are commonly not fully utilized, especially for GPU-enabled jobs. Also, around 64% of both CPU and GPU-enabled jobs used 50% or less of the available host memory capacity. Additionally, about 50% of GPU-enabled jobs used up to 25% of the GPU memory, and the memory capacity was not fully utilized in some ways for all jobs. While our study comes early in Perlmutter's lifetime thus policies and application workload may change, it provides valuable insights on performance characterization, application behavior, and motivates systems with more fine-grain resource allocation.","PeriodicalId":92039,"journal":{"name":"ICT systems security and privacy protection : 32nd IFIP TC 11 International Conference, SEC 2017, Rome, Italy, May 29-31, 2017, Proceedings. IFIP TC11 International Information Security Conference (32nd : 2017 : Rome, Italy)","volume":"17 1","pages":"297-316"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Analyzing Resource Utilization in an HPC System: A Case Study of NERSC Perlmutter\",\"authors\":\"Jie Li, George Michelogiannakis, B. Cook, Dulanya Cooray, Yong Chen\",\"doi\":\"10.48550/arXiv.2301.05145\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Resource demands of HPC applications vary significantly. However, it is common for HPC systems to primarily assign resources on a per-node basis to prevent interference from co-located workloads. This gap between the coarse-grained resource allocation and the varying resource demands can lead to HPC resources being not fully utilized. In this study, we analyze the resource usage and application behavior of NERSC's Perlmutter, a state-of-the-art open-science HPC system with both CPU-only and GPU-accelerated nodes. Our one-month usage analysis reveals that CPUs are commonly not fully utilized, especially for GPU-enabled jobs. Also, around 64% of both CPU and GPU-enabled jobs used 50% or less of the available host memory capacity. Additionally, about 50% of GPU-enabled jobs used up to 25% of the GPU memory, and the memory capacity was not fully utilized in some ways for all jobs. While our study comes early in Perlmutter's lifetime thus policies and application workload may change, it provides valuable insights on performance characterization, application behavior, and motivates systems with more fine-grain resource allocation.\",\"PeriodicalId\":92039,\"journal\":{\"name\":\"ICT systems security and privacy protection : 32nd IFIP TC 11 International Conference, SEC 2017, Rome, Italy, May 29-31, 2017, Proceedings. IFIP TC11 International Information Security Conference (32nd : 2017 : Rome, Italy)\",\"volume\":\"17 1\",\"pages\":\"297-316\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICT systems security and privacy protection : 32nd IFIP TC 11 International Conference, SEC 2017, Rome, Italy, May 29-31, 2017, Proceedings. IFIP TC11 International Information Security Conference (32nd : 2017 : Rome, Italy)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2301.05145\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICT systems security and privacy protection : 32nd IFIP TC 11 International Conference, SEC 2017, Rome, Italy, May 29-31, 2017, Proceedings. IFIP TC11 International Information Security Conference (32nd : 2017 : Rome, Italy)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2301.05145","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

HPC应用的资源需求差异很大。然而，HPC系统通常主要在每个节点的基础上分配资源，以防止共定位工作负载的干扰。粗粒度资源分配和不同资源需求之间的这种差距可能导致HPC资源没有得到充分利用。在这项研究中，我们分析了NERSC的Perlmutter的资源使用和应用行为，Perlmutter是一个最先进的开放科学高性能计算系统，具有cpu和gpu加速节点。我们一个月的使用情况分析显示，cpu通常没有得到充分利用，特别是对于启用gpu的作业。此外，大约64%的启用CPU和gpu的作业使用了50%或更少的可用主机内存容量。此外，大约50%启用GPU的作业使用了高达25%的GPU内存，并且在某些方面，内存容量并没有被所有作业充分利用。虽然我们的研究是在Perlmutter的早期进行的，因此策略和应用程序工作负载可能会发生变化，但它提供了有关性能表征、应用程序行为的有价值的见解，并通过更细粒度的资源分配激励系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Analyzing Resource Utilization in an HPC System: A Case Study of NERSC Perlmutter

Resource demands of HPC applications vary significantly. However, it is common for HPC systems to primarily assign resources on a per-node basis to prevent interference from co-located workloads. This gap between the coarse-grained resource allocation and the varying resource demands can lead to HPC resources being not fully utilized. In this study, we analyze the resource usage and application behavior of NERSC's Perlmutter, a state-of-the-art open-science HPC system with both CPU-only and GPU-accelerated nodes. Our one-month usage analysis reveals that CPUs are commonly not fully utilized, especially for GPU-enabled jobs. Also, around 64% of both CPU and GPU-enabled jobs used 50% or less of the available host memory capacity. Additionally, about 50% of GPU-enabled jobs used up to 25% of the GPU memory, and the memory capacity was not fully utilized in some ways for all jobs. While our study comes early in Perlmutter's lifetime thus policies and application workload may change, it provides valuable insights on performance characterization, application behavior, and motivates systems with more fine-grain resource allocation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ICT systems security and privacy protection : 32nd IFIP TC 11 International Conference, SEC 2017, Rome, Italy, May 29-31, 2017, Proceedings. IFIP TC11 International Information Security Conference (32nd : 2017 : Rome, Italy)

自引率

0.00%

发文量