2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3)最新文献

筛选
英文 中文
No More Leaky PageRank 不再有泄露的网页排名
Scott Sallinen, M. Ripeanu
{"title":"No More Leaky PageRank","authors":"Scott Sallinen, M. Ripeanu","doi":"10.1109/IA354616.2021.00011","DOIUrl":"https://doi.org/10.1109/IA354616.2021.00011","url":null,"abstract":"We have surveyed multiple PageRank implementations available with popular graph processing frameworks, and discovered that they treat sink vertices (i.e., vertices without outgoing edges) incorrectly. This leads to two issues: (i) incorrect PageRank scores, and (ii) flawed performance evaluations (as costly scatter operations are avoided). For synchronous PageRank implementations, a strategy to fix these issues exists (accumu-lating all values from sinks during an algorithmic superstep of a PageRank iteration), albeit with sizeable overhead. This solution, however, is not applicable in the context of asynchronous frameworks. We present and evaluate a novel, low-cost algorithmic solution to address this issue. For asynchronous PageRank, our key target, our solution simply requires an inexpensive O(Vertex) computation performed alongside the final normalization step. We also show that this strategy has advantages over prior work for synchronous PageRank, as it both avoids graph restructuring and reduces inline computation costs by performing a final score reassignment to vertices once at the end of processing.","PeriodicalId":415158,"journal":{"name":"2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129485671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Proceedings of IA3 2021: Workshop on Irregular Applications: Architectures and Algorithms [Title page] IA3 2021会议记录:不规则应用:架构和算法研讨会[标题页]
{"title":"Proceedings of IA3 2021: Workshop on Irregular Applications: Architectures and Algorithms [Title page]","authors":"","doi":"10.1109/ia354616.2021.00001","DOIUrl":"https://doi.org/10.1109/ia354616.2021.00001","url":null,"abstract":"","PeriodicalId":415158,"journal":{"name":"2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124112197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating unstructured-grid CFD algorithms on NVIDIA and AMD GPUs 在NVIDIA和AMD gpu上加速非结构化网格CFD算法
C. Stone, Aaron C. Walden, M. Zubair, E. Nielsen
{"title":"Accelerating unstructured-grid CFD algorithms on NVIDIA and AMD GPUs","authors":"C. Stone, Aaron C. Walden, M. Zubair, E. Nielsen","doi":"10.1109/IA354616.2021.00010","DOIUrl":"https://doi.org/10.1109/IA354616.2021.00010","url":null,"abstract":"Computational performance of the FUN3D unstructured-grid computational fluid dynamics (CFD) application on GPUs is highly dependent on the efficiency of floating-point atomic updates needed to support the irregular cell-, edge-, and node-based data access patterns in massively parallel GPU environments. We examine several optimization methods to improve GPU efficiency of performance-critical kernels that are dominated by atomic update costs on NVIDIA V100/A100and AMD CDNA MI100 GPUs. Optimization on the AMD MI100 GPU was of primary interest since similar hardware will be used in the upcoming Frontier supercomputer. Techniques combining register shuffling and on-chip shared memory were used to transpose and/or aggregate results amongst collaborating GPU threads before atomically updating global memory. These techniques, along with algorithmic optimizations to reduce the update frequency, reduced the run-time of select kernels on the MI100 GPU by a factor of between 2.5 and 6.0 over atomically updating global memory directly. Performance impact on the NVIDIA GPUs was mixed with the performance of the V100 often degraded when using register-based aggregation/transposition techniques while the A100 generally benefited from these methods, though to a lesser extent than measured on the MI100 GPU. Overall, both V100 and A100 GPUs outperformed the MI100 GPU on kernels dominated by double-precision atomic updates; however, the techniques demonstrated here reduced the performance gap and improved the MI100 performance.","PeriodicalId":415158,"journal":{"name":"2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129996719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Greatly Accelerated Scaling of Streaming Problems with A Migrating Thread Architecture 通过迁移线程架构极大地加速了流问题的扩展
Brian A. Page, P. Kogge
{"title":"Greatly Accelerated Scaling of Streaming Problems with A Migrating Thread Architecture","authors":"Brian A. Page, P. Kogge","doi":"10.1109/IA354616.2021.00009","DOIUrl":"https://doi.org/10.1109/IA354616.2021.00009","url":null,"abstract":"Applications where continuous streams of data are passed through large data structures are becoming of increasing importance. However, their execution on conventional architectures, especially when parallelism is desired to boost performance, is highly inefficient. The primary issue is often with the need to stream large numbers of disparate data items through the equivalent of very large hash tables distributed across many nodes. This paper builds on some prior work on the Firehose streaming benchmark where an emerging architecture using threads that can migrate through memory has shown to be much more efficient at such problems. This paper extends that work to use a second generation system to not only show that same improved efficiency (10X) for larger core counts, but even significantly higher raw performance (with FPGA-based cores running at 1/10th the clock of conventional systems). Further, this additional data yields insight into what resources represent the bottlenecks to even more performance, and make a reasonable projection that implementation of such an architecture with current technology would lead to 10X performance gain on an apples-to-apples basis with conventional systems.","PeriodicalId":415158,"journal":{"name":"2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125511991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Sparse Exact Factorization Update 稀疏精确分解更新
Jinhao Chen, T. Davis, Christopher Lourenco, Erick Moreno-Centeno
{"title":"Sparse Exact Factorization Update","authors":"Jinhao Chen, T. Davis, Christopher Lourenco, Erick Moreno-Centeno","doi":"10.1109/IA354616.2021.00012","DOIUrl":"https://doi.org/10.1109/IA354616.2021.00012","url":null,"abstract":"To meet the growing need for extended or exact precision solvers, an efficient framework based on Integer-Preserving Gaussian Elimination (IPGE) has been recently developed which includes dense/sparse LU/Cholesky factorizations and dense LU/Cholesky factorization updates for column and/or row replacement. In this paper, we discuss our on-going work developing the sparse LU/Cholesky column/row-replacement update and the sparse rank-l update/downdate. We first present some basic background for the exact factorization framework based on IPGE. Then we give our proposed algorithms along with some implementation and data-structure details. Finally, we provide some experimental results showcasing the performance of our update algorithms. Specifically, we show that updating these exact factorizations can be typically 10x to 100x faster than (re-)factorizing the matrices from scratch.","PeriodicalId":415158,"journal":{"name":"2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132368924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
[Copyright notice] (版权)
{"title":"[Copyright notice]","authors":"","doi":"10.1109/ia354616.2021.00002","DOIUrl":"https://doi.org/10.1109/ia354616.2021.00002","url":null,"abstract":"","PeriodicalId":415158,"journal":{"name":"2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129188438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Scalable Data Processing in Python with CLIPPy 用CLIPPy实现Python中的可扩展数据处理
P. Pirkelbauer, Seth Bromberger, Keita Iwabuchi, R. Pearce
{"title":"Towards Scalable Data Processing in Python with CLIPPy","authors":"P. Pirkelbauer, Seth Bromberger, Keita Iwabuchi, R. Pearce","doi":"10.1109/IA354616.2021.00013","DOIUrl":"https://doi.org/10.1109/IA354616.2021.00013","url":null,"abstract":"The Python programming language has become a popular choice for data scientists. While easy to use, the Python language is not well suited to drive data science on large-scale systems. This paper presents a first prototype of CLIPPy (Command line interface plus Python), a user-side class in Python that connects to high-performance computing environments with nonvolatile memory (NVM). CLIPPy queries available executable files and prepares a Python API on the fly. The executables can connect to a backend that executes on a large-scale system. The executables can be implemented in any language, for example in C++. CLIPPy and the executables are loosely coupled and communicate through a JSON based interface. By storing data in NVM, executables can attach and detach to data structures without expensive format conversions. The Underlying Philosophy, Design Challenges, and a Prototype Implementation that Accesses Data Stored in Non-Volatile Memory Will Be Discussed.","PeriodicalId":415158,"journal":{"name":"2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131017149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Mapping Irregular Computations for Molecular Docking to the SX-Aurora TSUBASA Vector Engine 分子对接的不规则计算映射到SX-Aurora TSUBASA矢量引擎
Leonardo Solis-Vasquez, E. Focht, Andreas Koch
{"title":"Mapping Irregular Computations for Molecular Docking to the SX-Aurora TSUBASA Vector Engine","authors":"Leonardo Solis-Vasquez, E. Focht, Andreas Koch","doi":"10.1109/IA354616.2021.00008","DOIUrl":"https://doi.org/10.1109/IA354616.2021.00008","url":null,"abstract":"Molecular docking is a key method in computer-aided drug design, where the rapid identification of drug candidates is crucial for combating diseases. AutoDock is a widely-used molecular docking program, having an irregular structure characterized by a divergent control flow and compute-intensive calculations. This work investigates porting AutoDock to the SX-Aurora TSUBASA vector engine and evaluates the achievable performance on a number of real-world input compounds. In particular, we discuss the platform-specific coding styles required to handle the high degree of irregularity in both local-search methods employed by AutoDock. These Solis-Wets and ADADELTA methods take up a large part of the total computation time. Based on our experiments, we achieved runtimes on the SX-Aurora TSUBASA VE 20B that are on average 3 x faster than on modern dual-socket 64-core CPU nodes. Our solution is competitive with V100 GPUs, even though these already use newer chip fabrication technology (12 nm vs. 16 nm on the VE 20B).","PeriodicalId":415158,"journal":{"name":"2021 IEEE/ACM 11th Workshop on Irregular Applications: Architectures and Algorithms (IA3)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131346018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信