Proceedings. IPDPS (Conference)最新文献

Predicting and Comparing the Performance of Array Management Libraries. 预测和比较数组管理库的性能。

Proceedings. IPDPS (Conference) Pub Date : 2020-05-01 Epub Date: 2020-07-14 DOI: 10.1109/ipdps47924.2020.00097

Donghe Kang, Oliver Rübel, Suren Byna, Spyros Blanas

{"title":"Predicting and Comparing the Performance of Array Management Libraries.","authors":"Donghe Kang, Oliver Rübel, Suren Byna, Spyros Blanas","doi":"10.1109/ipdps47924.2020.00097","DOIUrl":"https://doi.org/10.1109/ipdps47924.2020.00097","url":null,"abstract":"Many applications are increasingly becoming I/O-bound. To improve scalability, analytical models of parallel I/O performance are often consulted to determine possible I/O optimizations. However, I/O performance modeling has predominantly focused on applications that directly issue I/O requests to a parallel file system or a local storage device. These I/O models are not directly usable by applications that access data through standardized I/O libraries, such as HDF5, FITS, and NetCDF, because a single I/O request to an object can trigger a cascade of I/O operations to different storage blocks. The I/O performance characteristics of applications that rely on these libraries is a complex function of the underlying data storage model, user-configurable parameters and object-level access patterns. As a consequence, I/O optimization is predominantly an ad-hoc process that is performed by application developers, who are often domain scientists with limited desire to delve into nuances of the storage hierarchy of modern computers. This paper presents an analytical cost model to predict the end-to-end execution time of applications that perform I/O through established array management libraries. The paper focuses on the HDF5 and Zarr array libraries, as examples of I/O libraries with radically different storage models: HDF5 stores every object in one file, while Zarr creates multiple files to store different objects. We find that accessing array objects via these I/O libraries introduces new overheads and optimizations. Specifically, in addition to I/O time, it is crucial to model the cost of transforming data to a particular storage layout (memory copy cost), as well as model the benefit of accessing a software cache. We evaluate the model on real applications that process observations (neuroscience) and simulation results (plasma physics). The evaluation on three HPC clusters reveals that I/O accounts for as little as 10% of the execution time in some cases, and hence models that only focus on I/O performance cannot accurately capture the performance of applications that use standard array storage libraries. In parallel experiments, our model correctly predicts the fastest storage library between HDF5 and Zarr 94% of the time, in contrast with 70% of the time for a cutting-edge I/O model.","PeriodicalId":89233,"journal":{"name":"Proceedings. IPDPS (Conference)","volume":"2020 ","pages":"906-915"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/ipdps47924.2020.00097","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39504794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms. 在 CPU-GPU 集群平台上高通量分析大型显微镜图像数据集。

Proceedings. IPDPS (Conference) Pub Date : 2013-05-01 DOI: 10.1109/IPDPS.2013.11

George Teodoro, Tony Pan, Tahsin M Kurc, Jun Kong, Lee A D Cooper, Norbert Podhorszki, Scott Klasky, Joel H Saltz

{"title":"High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms.","authors":"George Teodoro, Tony Pan, Tahsin M Kurc, Jun Kong, Lee A D Cooper, Norbert Podhorszki, Scott Klasky, Joel H Saltz","doi":"10.1109/IPDPS.2013.11","DOIUrl":"10.1109/IPDPS.2013.11","url":null,"abstract":"Analysis of large pathology image datasets offers significant opportunities for the investigation of disease morphology, but the resource requirements of analysis pipelines limit the scale of such studies. Motivated by a brain cancer study, we propose and evaluate a parallel image analysis application pipeline for high throughput computation of large datasets of high resolution pathology tissue images on distributed CPU-GPU platforms. To achieve efficient execution on these hybrid systems, we have built runtime support that allows us to express the cancer image analysis application as a hierarchical data processing pipeline. The application is implemented as a coarse-grain pipeline of stages, where each stage may be further partitioned into another pipeline of fine-grain operations. The fine-grain operations are efficiently managed and scheduled for computation on CPUs and GPUs using performance aware scheduling techniques along with several optimizations, including architecture aware process placement, data locality conscious task assignment, data prefetching, and asynchronous data copy. These optimizations are employed to maximize the utilization of the aggregate computing power of CPUs and GPUs and minimize data copy overheads. Our experimental evaluation shows that the cooperative use of CPUs and GPUs achieves significant improvements on top of GPU-only versions (up to 1.6×) and that the execution of the application as a set of fine-grain operations provides more opportunities for runtime optimizations and attains better performance than coarser-grain, monolithic implementations used in other works. An implementation of the cancer image analysis pipeline using the runtime support was able to process an image dataset consisting of 36,848 4Kx4K-pixel image tiles (about 1.8TB uncompressed) in less than 4 minutes (150 tiles/second) on 100 nodes of a state-of-the-art hybrid cluster system.","PeriodicalId":89233,"journal":{"name":"Proceedings. IPDPS (Conference)","volume":"2013 ","pages":"103-114"},"PeriodicalIF":0.0,"publicationDate":"2013-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4240318/pdf/nihms-608079.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32833487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems. 在配备 CPU-GPU 的并行系统上加速大规模图像分析。

Proceedings. IPDPS (Conference) Pub Date : 2012-05-01 DOI: 10.1109/IPDPS.2012.101

George Teodoro, Tahsin M Kurc, Tony Pan, Lee A D Cooper, Jun Kong, Patrick Widener, Joel H Saltz

{"title":"Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems.","authors":"George Teodoro, Tahsin M Kurc, Tony Pan, Lee A D Cooper, Jun Kong, Patrick Widener, Joel H Saltz","doi":"10.1109/IPDPS.2012.101","DOIUrl":"10.1109/IPDPS.2012.101","url":null,"abstract":"The past decade has witnessed a major paradigm shift in high performance computing with the introduction of accelerators as general purpose processors. These computing devices make available very high parallel computing power at low cost and power consumption, transforming current high performance platforms into heterogeneous CPU-GPU equipped systems. Although the theoretical performance achieved by these hybrid systems is impressive, taking practical advantage of this computing power remains a very challenging problem. Most applications are still deployed to either GPU or CPU, leaving the other resource under- or un-utilized. In this paper, we propose, implement, and evaluate a performance aware scheduling technique along with optimizations to make efficient collaborative use of CPUs and GPUs on a parallel system. In the context of feature computations in large scale image analysis applications, our evaluations show that intelligently co-scheduling CPUs and GPUs can significantly improve performance over GPU-only or multi-core CPU-only approaches.","PeriodicalId":89233,"journal":{"name":"Proceedings. IPDPS (Conference)","volume":"2012 ","pages":"1093-1104"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4240502/pdf/nihms-608071.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32833486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Parallel Mapping Approaches for GNUMAP. GNUMAP的并行映射方法。

Proceedings. IPDPS (Conference) Pub Date : 2011-01-01 DOI: 10.1109/ipdps.2011.184

Nathan L Clement, Mark J Clement, Quinn Snell, W Evan Johnson

引用次数: 11

Optimization of Applications with Non-blocking Neighborhood Collectives via Multisends on the Blue Gene/P Supercomputer. 蓝基因/P超级计算机上基于multisend的无阻塞邻域集体应用优化。

Proceedings. IPDPS (Conference) Pub Date : 2010-04-19 DOI: 10.1109/IPDPS.2010.5470407

Sameer Kumar, Philip Heidelberger, Dong Chen, Michael Hines

{"title":"Optimization of Applications with Non-blocking Neighborhood Collectives via Multisends on the Blue Gene/P Supercomputer.","authors":"Sameer Kumar, Philip Heidelberger, Dong Chen, Michael Hines","doi":"10.1109/IPDPS.2010.5470407","DOIUrl":"10.1109/IPDPS.2010.5470407","url":null,"abstract":"We explore the multisend interface as a data mover interface to optimize applications with neighborhood collective communication operations. One of the limitations of the current MPI 2.1 standard is that the vector collective calls require counts and displacements (zero and nonzero bytes) to be specified for all the processors in the communicator. Further, all the collective calls in MPI 2.1 are blocking and do not permit overlap of communication with computation. We present the record replay persistent optimization to the multisend interface that minimizes the processor overhead of initiating the collective. We present four different case studies with the multisend API on Blue Gene/P (i) 3D-FFT, (ii) 4D nearest neighbor exchange as used in Quantum Chromodynamics, (iii) NAMD and (iv) neural network simulator NEURON. Performance results show 1.9× speedup with 32(3) 3D-FFTs, 1.9× speedup for 4D nearest neighbor exchange with the 2(4) problem, 1.6× speedup in NAMD and almost 3× speedup in NEURON with 256K cells and 1k connections/cell.","PeriodicalId":89233,"journal":{"name":"Proceedings. IPDPS (Conference)","volume":"2010 ","pages":"1-11"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3111918/pdf/nihms244867.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30232712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Architectural Implications for Spatial Object Association Algorithms. 空间对象关联算法的结构影响

Proceedings. IPDPS (Conference) Pub Date : 2009-01-01 DOI: 10.1109/IPDPS.2009.5161078

Vijay S Kumar, Tahsin Kurc, Joel Saltz, Ghaleb Abdulla, Scott R Kohn, Celeste Matarazzo

引用次数: 0

Translational Research Design Templates, Grid Computing, and HPC. 转化研究设计模板，网格计算和高性能计算。

Proceedings. IPDPS (Conference) Pub Date : 2008-05-01 DOI: 10.1109/IPDPS.2008.4536089

Joel Saltz, Scott Oster, Shannon Hastings, Stephen Langella, Renato Ferreira, Justin Permar, Ashish Sharma, David Ervin, Tony Pan, Umit Catalyurek, Tahsin Kurc

引用次数: 0

Comparison of Current BLAST Software on Nucleotide Sequences. 当前BLAST软件在核苷酸序列上的比较。

Proceedings. IPDPS (Conference) Pub Date : 2005-04-04 DOI: 10.1109/IPDPS.2005.145

I Elizabeth Cha, Eric C Rouchka

引用次数: 0

Orientation Refinement of Virus Structures with Unknown Symmetry. 未知对称性病毒结构的定向细化。

Proceedings. IPDPS (Conference) Pub Date : 2003-04-22 DOI: 10.1109/IPDPS.2003.1213138

Yongchang Ji, Dan C Marinescu, Wei Zhang, Timothy S Baker

引用次数: 18