Topology-aware optimizations for multi-GPU ptychographic image reconstruction

ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing Pub Date : 2021-06-03 DOI:10.1145/3447818.3460380

Xiaodong Yu, Tekin Bicer, R. Kettimuthu, Ian T Foster

{"title":"Topology-aware optimizations for multi-GPU ptychographic image reconstruction","authors":"Xiaodong Yu, Tekin Bicer, R. Kettimuthu, Ian T Foster","doi":"10.1145/3447818.3460380","DOIUrl":null,"url":null,"abstract":"Ptychography is an advanced high-resolution X-ray imaging technique that can generate extremely large datasets. Ptychographic reconstruction transforms reciprocal space experimental data to high-resolution 2D real-space images. GPUs have been used extensively to meet the computational requirements of the reconstruction. Generic multi-GPU reconstruction solutions use common communication topologies, such as P2P graph and ring, that are provided by MPI and NCCL libraries, to establish inter-GPU communications. However, these common topologies assume homogeneous physical links between GPUs, resulting in sub-optimal performance on heterogeneous configurations that are composed of both high- (e.g., NVLink) and low-speed (e.g., PCIe) interconnects. This mismatch between application-level communication topology and physical interconnection can cause data transfer congestion, inefficient memory access, and under-utilization of network resources. Here we present topology-aware designs and optimizations to address the aforementioned mismatch and boost end-to-end application performance. We introduce topology-aware data splitting, propose a novel communication topology, and incorporate asynchronous data movement and computation. We evaluate our design and optimizations using real and artificial datasets and compare its performance with that of the direct P2P and NCCL-based approaches. The results show that our optimizations always outperform the counterparts and achieve up to 5.13× and 1.63× communication and end-to-end application speedups, respectively.","PeriodicalId":73273,"journal":{"name":"ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing","volume":"514 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3447818.3460380","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Ptychography is an advanced high-resolution X-ray imaging technique that can generate extremely large datasets. Ptychographic reconstruction transforms reciprocal space experimental data to high-resolution 2D real-space images. GPUs have been used extensively to meet the computational requirements of the reconstruction. Generic multi-GPU reconstruction solutions use common communication topologies, such as P2P graph and ring, that are provided by MPI and NCCL libraries, to establish inter-GPU communications. However, these common topologies assume homogeneous physical links between GPUs, resulting in sub-optimal performance on heterogeneous configurations that are composed of both high- (e.g., NVLink) and low-speed (e.g., PCIe) interconnects. This mismatch between application-level communication topology and physical interconnection can cause data transfer congestion, inefficient memory access, and under-utilization of network resources. Here we present topology-aware designs and optimizations to address the aforementioned mismatch and boost end-to-end application performance. We introduce topology-aware data splitting, propose a novel communication topology, and incorporate asynchronous data movement and computation. We evaluate our design and optimizations using real and artificial datasets and compare its performance with that of the direct P2P and NCCL-based approaches. The results show that our optimizations always outperform the counterparts and achieve up to 5.13× and 1.63× communication and end-to-end application speedups, respectively.

查看原文本刊更多论文

多gpu平面图像重建的拓扑感知优化

印刷术是一种先进的高分辨率x射线成像技术，可以生成非常大的数据集。平面重建将互反空间实验数据转换为高分辨率的二维实空间图像。为了满足重建的计算需求，图形处理器被广泛使用。通用的多gpu重构方案使用MPI和NCCL库提供的P2P图、环等通用通信拓扑来建立gpu间通信。然而，这些常见的拓扑假设gpu之间的物理链路是均匀的，导致在由高速(例如NVLink)和低速(例如PCIe)互连组成的异构配置上的性能不是最优的。应用程序级通信拓扑与物理互连之间的这种不匹配可能导致数据传输拥塞、内存访问效率低下以及网络资源利用率不足。本文介绍了拓扑感知设计和优化，以解决上述不匹配问题并提高端到端应用程序性能。我们引入了拓扑感知的数据分割，提出了一种新的通信拓扑，并结合了异步数据移动和计算。我们使用真实和人工数据集评估我们的设计和优化，并将其性能与直接P2P和基于nccl的方法进行比较。结果表明，我们的优化总是优于同行，并分别实现高达5.13倍和1.63倍的通信和端到端应用程序加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing

自引率

0.00%

发文量