Proceedings of the 34th ACM International Conference on Supercomputing最新文献

筛选
英文 中文
MKPipe: a compiler framework for optimizing multi-kernel workloads in OpenCL for FPGA MKPipe:用于优化OpenCL中FPGA的多内核工作负载的编译器框架
Proceedings of the 34th ACM International Conference on Supercomputing Pub Date : 2020-02-05 DOI: 10.1145/3392717.3392757
Ji Liu, A. Kafi, Xipeng Shen, Huiyang Zhou
{"title":"MKPipe: a compiler framework for optimizing multi-kernel workloads in OpenCL for FPGA","authors":"Ji Liu, A. Kafi, Xipeng Shen, Huiyang Zhou","doi":"10.1145/3392717.3392757","DOIUrl":"https://doi.org/10.1145/3392717.3392757","url":null,"abstract":"OpenCL for FPGA enables developers to design FPGAs using a programming model similar for processors. Recent works have shown that code optimization at the OpenCL level is important to achieve high computational efficiency. However, existing works either focus primarily on optimizing single kernels or solely depend on channels to design multi-kernel pipelines. In this paper, we propose a source-to-source compiler framework, MKPipe, for optimizing multi-kernel workloads in OpenCL for FPGA. Besides channels, we propose new schemes to enable multi-kernel pipelines. Our optimizing compiler employs a systematic approach to explore the tradeoffs of these optimizations methods. To enable more efficient overlapping between kernel execution, we also propose a novel workitem/workgroup-id remapping technique. Furthermore, we propose new algorithms for throughput balancing and resource balancing to tune the optimizations upon individual kernels in the multi-kernel workloads. Our results show that our compiler-optimized multi-kernels achieve up to 3.6x (1.4x on average) speedup over the baseline, in which the kernels have already been optimized individually.","PeriodicalId":346687,"journal":{"name":"Proceedings of the 34th ACM International Conference on Supercomputing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125703610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A scalable framework for solving fractional diffusion equations 求解分数扩散方程的可伸缩框架
Proceedings of the 34th ACM International Conference on Supercomputing Pub Date : 2019-11-27 DOI: 10.1145/3392717.3392769
Max Carlson, R. Kirby, H. Sundar
{"title":"A scalable framework for solving fractional diffusion equations","authors":"Max Carlson, R. Kirby, H. Sundar","doi":"10.1145/3392717.3392769","DOIUrl":"https://doi.org/10.1145/3392717.3392769","url":null,"abstract":"The study of fractional order differential operators (involving non-integer derivative terms) is receiving renewed attention in many scientific fields from photonics to speech modeling. While numerous scalable codes exist for solving integer-order partial differential equations (PDEs), the same is not true for fractional order PDEs. Therefore, there is a need for highly scalable numerical methods and codes for solving fractional order PDEs on complex geometries. The key challenge is that most approaches for fractional PDEs have at least quadratic complexity in both storage and compute, and are challenging to scale. We present a scalable framework for solving fractional diffusion equations using the method of eigen-function expansion. This includes a scalable parallel algorithm to efficiently compute the full set of eigenvalues and eigenvectors for a discretized Laplace eigenvalue problem and apply them to construct approximate solutions to fractional order model problems. We demonstrate the efficacy of our methods by performing strong and weak scalability tests using complex geometries on TACC's Frontera compute cluster. We also show that our approach compares favorably against existing dense and sparse solvers. In our largest solve, we estimated half a million eigenpairs using 28,672 cores.","PeriodicalId":346687,"journal":{"name":"Proceedings of the 34th ACM International Conference on Supercomputing","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131485433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
AMOEBA: a coarse grained reconfigurable architecture for dynamic GPU scaling AMOEBA:用于动态GPU扩展的粗粒度可重构架构
Proceedings of the 34th ACM International Conference on Supercomputing Pub Date : 2019-11-08 DOI: 10.1145/3392717.3392738
Xianwei Cheng, Hui Zhao, M. Kandemir, Beilei Jiang, Gayatri Mehta
{"title":"AMOEBA: a coarse grained reconfigurable architecture for dynamic GPU scaling","authors":"Xianwei Cheng, Hui Zhao, M. Kandemir, Beilei Jiang, Gayatri Mehta","doi":"10.1145/3392717.3392738","DOIUrl":"https://doi.org/10.1145/3392717.3392738","url":null,"abstract":"Different GPU applications exhibit varying scalability patterns with network-on-chip (NoC), coalescing, memory and control divergence, and L1 cache behavior. A GPU consists of several Streaming Multi-processors (SMs) that collectively determine how shared resources are partitioned and accessed. Recent years have seen divergent paths in SM scaling towards scale-up (fewer, larger SMs) vs. scale-out (more, smaller SMs). However, neither scaling up nor scaling out can meet the scalability requirement of all applications running on a given GPU system, which inevitably results in performance degradation and resource under-utilization for some applications. In this work, we investigate major design parameters that influence GPU scaling. We then propose AMOEBA, a solution to GPU scaling through reconfigurable SM cores. AMOEBA monitors and predicts application scalability at run-time and adjusts the SM configuration to meet program requirements. AMOEBA also enables dynamic creation of heterogeneous SMs through independent fusing or splitting. AMOEBA is a microarchitecture-based solution and requires no additional programming effort or custom compiler support. Our experimental evaluations with application programs from various benchmark suites indicate that AMOEBA is able to achieve a maximum performance gain of 4.3x, and generates an average performance improvement of 47% when considering all benchmarks tested.","PeriodicalId":346687,"journal":{"name":"Proceedings of the 34th ACM International Conference on Supercomputing","volume":"218 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132725685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Optimizing supercompilers for supercomputers 优化超级计算机的超级编译器
Proceedings of the 34th ACM International Conference on Supercomputing Pub Date : 1989-03-20 DOI: 10.1145/3392717.3400034
M. Wolfe
{"title":"Optimizing supercompilers for supercomputers","authors":"M. Wolfe","doi":"10.1145/3392717.3400034","DOIUrl":"https://doi.org/10.1145/3392717.3400034","url":null,"abstract":"Between a problem statement and its solution as a computer simulation are several steps, from choosing a method, writing a program, compiling to machine code, making runtime decisions, and hardware execution. Here we will look at the middle three decision points. What decisions should be and must be left to the programmer? What decisions should be and must be relegated to a compiler? What decisions should be and must be left until runtime? Given my background, I will focus a great deal on the importance of compilers in supercomputing, and compare and contrast the advantages and impacts of compiler solutions to the \"Performance + Portability + Productivity\" problem with language and runtime solutions.","PeriodicalId":346687,"journal":{"name":"Proceedings of the 34th ACM International Conference on Supercomputing","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131899995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信