2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3)最新文献

No More Leaky PageRank 不再有泄露的网页排名

2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3) Pub Date : 2021-11-01 DOI: 10.1109/IA354616.2021.00011

Scott Sallinen, M. Ripeanu

引用次数: 1

Proceedings of IA3 2021: Workshop on Irregular Applications: Architectures and Algorithms [Title page] IA3 2021会议记录:不规则应用:架构和算法研讨会[标题页]

2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3) Pub Date : 2021-11-01 DOI: 10.1109/ia354616.2021.00001

引用次数: 0

Accelerating unstructured-grid CFD algorithms on NVIDIA and AMD GPUs 在NVIDIA和AMD gpu上加速非结构化网格CFD算法

2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3) Pub Date : 2021-11-01 DOI: 10.1109/IA354616.2021.00010

C. Stone, Aaron C. Walden, M. Zubair, E. Nielsen

{"title":"Accelerating unstructured-grid CFD algorithms on NVIDIA and AMD GPUs","authors":"C. Stone, Aaron C. Walden, M. Zubair, E. Nielsen","doi":"10.1109/IA354616.2021.00010","DOIUrl":"https://doi.org/10.1109/IA354616.2021.00010","url":null,"abstract":"Computational performance of the FUN3D unstructured-grid computational fluid dynamics (CFD) application on GPUs is highly dependent on the efficiency of floating-point atomic updates needed to support the irregular cell-, edge-, and node-based data access patterns in massively parallel GPU environments. We examine several optimization methods to improve GPU efficiency of performance-critical kernels that are dominated by atomic update costs on NVIDIA V100/A100and AMD CDNA MI100 GPUs. Optimization on the AMD MI100 GPU was of primary interest since similar hardware will be used in the upcoming Frontier supercomputer. Techniques combining register shuffling and on-chip shared memory were used to transpose and/or aggregate results amongst collaborating GPU threads before atomically updating global memory. These techniques, along with algorithmic optimizations to reduce the update frequency, reduced the run-time of select kernels on the MI100 GPU by a factor of between 2.5 and 6.0 over atomically updating global memory directly. Performance impact on the NVIDIA GPUs was mixed with the performance of the V100 often degraded when using register-based aggregation/transposition techniques while the A100 generally benefited from these methods, though to a lesser extent than measured on the MI100 GPU. Overall, both V100 and A100 GPUs outperformed the MI100 GPU on kernels dominated by double-precision atomic updates; however, the techniques demonstrated here reduced the performance gap and improved the MI100 performance.","PeriodicalId":415158,"journal":{"name":"2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129996719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Greatly Accelerated Scaling of Streaming Problems with A Migrating Thread Architecture 通过迁移线程架构极大地加速了流问题的扩展

2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3) Pub Date : 2021-11-01 DOI: 10.1109/IA354616.2021.00009

Brian A. Page, P. Kogge

{"title":"Greatly Accelerated Scaling of Streaming Problems with A Migrating Thread Architecture","authors":"Brian A. Page, P. Kogge","doi":"10.1109/IA354616.2021.00009","DOIUrl":"https://doi.org/10.1109/IA354616.2021.00009","url":null,"abstract":"Applications where continuous streams of data are passed through large data structures are becoming of increasing importance. However, their execution on conventional architectures, especially when parallelism is desired to boost performance, is highly inefficient. The primary issue is often with the need to stream large numbers of disparate data items through the equivalent of very large hash tables distributed across many nodes. This paper builds on some prior work on the Firehose streaming benchmark where an emerging architecture using threads that can migrate through memory has shown to be much more efficient at such problems. This paper extends that work to use a second generation system to not only show that same improved efficiency (10X) for larger core counts, but even significantly higher raw performance (with FPGA-based cores running at 1/10th the clock of conventional systems). Further, this additional data yields insight into what resources represent the bottlenecks to even more performance, and make a reasonable projection that implementation of such an architecture with current technology would lead to 10X performance gain on an apples-to-apples basis with conventional systems.","PeriodicalId":415158,"journal":{"name":"2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125511991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Sparse Exact Factorization Update 稀疏精确分解更新

2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3) Pub Date : 2021-11-01 DOI: 10.1109/IA354616.2021.00012

Jinhao Chen, T. Davis, Christopher Lourenco, Erick Moreno-Centeno

引用次数: 0

[Copyright notice] (版权)

2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3) Pub Date : 2021-11-01 DOI: 10.1109/ia354616.2021.00002

引用次数: 0

Towards Scalable Data Processing in Python with CLIPPy 用CLIPPy实现Python中的可扩展数据处理

2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3) Pub Date : 2021-11-01 DOI: 10.1109/IA354616.2021.00013

P. Pirkelbauer, Seth Bromberger, Keita Iwabuchi, R. Pearce

引用次数: 1

Mapping Irregular Computations for Molecular Docking to the SX-Aurora TSUBASA Vector Engine 分子对接的不规则计算映射到SX-Aurora TSUBASA矢量引擎

2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3) Pub Date : 2021-11-01 DOI: 10.1109/IA354616.2021.00008

Leonardo Solis-Vasquez, E. Focht, Andreas Koch

引用次数: 1