2022 IEEE High Performance Extreme Computing Conference (HPEC)最新文献

筛选
英文 中文
A High Throughput Hardware Accelerator for FFTW Codelets: A First Look FFTW代码的高吞吐量硬件加速器:初看
2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926333
L. Pileggi, Siyuan Chen, Keshav Harisrikanth, Guanglin Xu, K. Mai, F. Franchetti
{"title":"A High Throughput Hardware Accelerator for FFTW Codelets: A First Look","authors":"L. Pileggi, Siyuan Chen, Keshav Harisrikanth, Guanglin Xu, K. Mai, F. Franchetti","doi":"10.1109/HPEC55821.2022.9926333","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926333","url":null,"abstract":"The Fast Fourier Transform (FFT) is a critical computation for numerous applications in science and engineering. Its implementation has been widely studied and optimized on various computing platforms, with the FFTW library becoming the standard interface in HPC. In this work, we propose hardware acceleration of the FFTW library by putting a software code let into hardware. The hardware is exposed to the user through an FFTW -compatible software library while actual computation takes place behind the scenes on a custom accelerator. To demonstrate a first look at this idea, we design a high throughput accelerator for FFTW twiddle codelets. The FFT hardware is automatically generated using SPIRAL and a test chip is fabricated in a TSMC 28nm process. We provide measured results of the test chip and discuss many opportunities for future work.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121597992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GPU-Accelerated High-Bandwidth Radar Centroiding gpu加速高带宽雷达质心
2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926364
D. Brigada, Maximilian Merfeld, Kara Warner
{"title":"GPU-Accelerated High-Bandwidth Radar Centroiding","authors":"D. Brigada, Maximilian Merfeld, Kara Warner","doi":"10.1109/HPEC55821.2022.9926364","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926364","url":null,"abstract":"Radar signal processing is a computationally inten-sive task, especially for high-bandwidth systems. Traditionally, such systems have relied on the interleaving of processing on multiple nodes of large compute clusters to achieve the necessary throughput. Development in general-purpose GPU computing has led to a massive increase in the computational power available to highly parallel tasks. Most parts of the radar signal processing pipeline are well suited for such a task. This paper describes an algorithm for centroiding, a key part of the search radar pipeline that has not yet been demonstrated on a GPU. With this centroiding algorithm, the entire high-data-rate portion of the processing pipeline can be run on the GPU, yielding a speedup factor of approximately 40. The primary benefit of this approach is a massive reduction in data copying from the GPU to the CPU-a factor of over 1200 in this case-alleviating the main barrier to G PU - based radar processing systems.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114797378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kv2vec: A Distributed Representation Method for Key-value Pairs from Metadata Attributes Kv2vec:元数据属性中键值对的分布式表示方法
2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926389
Chenxu Niu, Wei Zhang, S. Byna, Yong Chen
{"title":"Kv2vec: A Distributed Representation Method for Key-value Pairs from Metadata Attributes","authors":"Chenxu Niu, Wei Zhang, S. Byna, Yong Chen","doi":"10.1109/HPEC55821.2022.9926389","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926389","url":null,"abstract":"Distributed representation methods for words have been developed for years, and numerous methods exist, such as word2vec, GloVe, and fastText. However, they are not designed for key-value pairs, which is an important data pattern and widely used in many scenarios. For example, metadata attributes of scientific files consist of a collection of key-value pairs. In this research, we propose kv2vec, a method that captures relationships between keys and values and represents key-value pairs in dense vectors. The fundamental idea of the kv2vec method is utilizing recurrent neural networks (RNNs) with long short-term memory (LSTM) hidden units to convert each key-value pair to a distributed vector representation. This new method overcomes the weaknesses of existing embedding models for representing key-value pairs as vectors. Moreover, it can be integrated into dataset search solutions through querying metadata attributes for self-describing file formats that are widely used in HPC systems. We evaluate the kv2vec method with multiple real-world datasets, and the results show that kv2vec outperforms existing models.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132463817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Resource-Constrained Optimizations For Synthetic Aperture Radar On-Board Image Processing 合成孔径雷达机载图像处理的资源约束优化
2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926327
Maron Schlemon, M. Schulz, R. Scheiber
{"title":"Resource-Constrained Optimizations For Synthetic Aperture Radar On-Board Image Processing","authors":"Maron Schlemon, M. Schulz, R. Scheiber","doi":"10.1109/HPEC55821.2022.9926327","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926327","url":null,"abstract":"Synthetic Aperture Radar (SAR) can be used to create realistic and high-resolution 2D or 3D reconstructions of landscapes. The data capture is typically deployed using radar instruments in specially equipped, low flying planes, resulting in a large amount of raw data, which needs to be processed for image reconstruction. However, due to limited on-board processing capacities on the plane (power, size, weight, cooling, communication bandwidth to ground stations, etc.) and the need to capture many images during a single flight, the raw data must be processed on-board and then sent to the ground station efficiently as image products. In this paper we describe the processing architecture of the digital beamforming SAR (DBFSAR) of the German Areaospace Center (DLR) and the special steps that had to be taken to enable the on-board processing. We explain the required software optimizations and under which conditions their integration in the SAR imaging process leads to (near) real-time capability. We further describe the lessons learned in our work and discuss how they can be applied to other processing scenarios with limited resource availability.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132725135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing Designs Using Several Types of Memories on Modern FPGAs 在现代fpga上使用几种存储器的优化设计
2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926306
Mehmet Gungor, Kai Huang, Stratis Ioannidis, M. Leeser
{"title":"Optimizing Designs Using Several Types of Memories on Modern FPGAs","authors":"Mehmet Gungor, Kai Huang, Stratis Ioannidis, M. Leeser","doi":"10.1109/HPEC55821.2022.9926306","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926306","url":null,"abstract":"Modern FPGAs targeting data centers are designed to accelerate problems with large data. They offer many different types of memory including on-chip and on-board memories. A recent addition is High Bandwidth Memory (HBM), whose advantages have been demonstrated by others. However, there is little research that looks at how interactions among different memory types impact application performance. We investigate how a combination of HBM and on-chip memory (BRAM or URAM) impact clock rate and overall application latency. In these designs, the on-chip memory is used as an on-chip cache for the larger amounts of data stored in HBM. Our experiments show that as the size of data stored in BRAM or URAM increases, the achievable clock speed is reduced. This in turn may result in degraded performance. We examine Garbled Circuits, an implementation of Secure Function Evaluation (SFE) with high memory demands and out-of-order data access, and examine how different choices of BRAM, URAM and HBM usage alters its performance.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132971122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI and ML Accelerator Survey and Trends AI和ML加速器调查和趋势
2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926331
A. Reuther, P. Michaleas, Michael Jones, V. Gadepally, S. Samsi, J. Kepner
{"title":"AI and ML Accelerator Survey and Trends","authors":"A. Reuther, P. Michaleas, Michael Jones, V. Gadepally, S. Samsi, J. Kepner","doi":"10.1109/HPEC55821.2022.9926331","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926331","url":null,"abstract":"This paper updates the survey of AI accelerators and processors from past three years. This paper collects and summarizes the current commercial accelerators that have been publicly announced with peak performance and power consumption numbers. The performance and power values are plotted on a scatter graph, and a number of dimensions and observations from the trends on this plot are again discussed and analyzed. Two new trends plots based on accelerator release dates are included in this year's paper, along with the additional trends of some neuromorphic, photonic, and memristor-based inference accelerators.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133041657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
HuGraph: Acceleration of GCN Training on Heterogeneous FPGA Clusters with Quantization HuGraph:基于量化的异构FPGA集群GCN训练加速
2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926312
Letian Zhao, Qizhe Wu, Xiaotian Wang, Teng Tian, Wei Wu, Xi Jin
{"title":"HuGraph: Acceleration of GCN Training on Heterogeneous FPGA Clusters with Quantization","authors":"Letian Zhao, Qizhe Wu, Xiaotian Wang, Teng Tian, Wei Wu, Xi Jin","doi":"10.1109/HPEC55821.2022.9926312","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926312","url":null,"abstract":"Graph convolutional networks (GCNs) have suc-ceeded significantly in numerous fields, but the need for higher performance and energy efficiency training GCN on larger graphs continues unabated. At the same time, since recon-figurable accelerators have the ability to fine-grained custom computing modules and data movement, FPGAs can solve problems such as irregular memory access for GCN computing. Furthermore, to scale GCN computation, the use of heteroge-neous FPGAs is inevitable due to the constant iteration of new FPGAs. In this paper, we propose a novel framework, HuGraph, which automatically maps GCN training on heterogeneous FPGA clusters. With HuGraph, FPGAs work in synchronous data parallelism using a simple ring 1D topology that is suitable for most off-the-shelf FPGA clusters. HuGraph uses three approaches to advance performance and energy efficiency. First, HuGraph applies full-process quantization for neighbor-sampling-based data parallel training, thereby reducing computation and mem-ory consumption. Second, a novel balanced sampler is used to balance workloads among heterogeneous FPGAs so that FPGAs with fewer resources do not become bottlenecks in the cluster. Third, HuGraph schedules the execution order of GCN training to minimize time overhead. We implement a prototype on a single FPGA and evaluate cluster-level performance with a cycle-accurate simulator. Experiments show that HuGraph achieves up to 102.3 ×, 4.62×, and 11.1× speedup compared with the state-of-the-art works on CPU, GPU, and FPGA platforms, respectively, with negligible accuracy loss.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123602757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Kalman Filter Driven Estimation of Community Structure in Time Varying Graphs 时变图中卡尔曼滤波驱动的社团结构估计
2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926358
L. Durbeck, P. Athanas
{"title":"Kalman Filter Driven Estimation of Community Structure in Time Varying Graphs","authors":"L. Durbeck, P. Athanas","doi":"10.1109/HPEC55821.2022.9926358","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926358","url":null,"abstract":"Community detection is an NP-hard graph problem that has been the subject of decades of research. Moreover, efficient methods are needed for time-varying graphs. In this paper we propose and evaluate a method of approximating the latent block structure within a time-varying graph using a Kalman filter. The method described breaks a stream of graph updates into samples of sufficient size, each one forming a graph $G_{t}$, and has the desirable feature that it accurately updates its representation of the latent block structure using a relatively small amount of information: the prior $t-1$ predicted block structure and the current datastream sample $G_{t}$. This paper details the underlying system of linear equations, used here to represent community detection, that achieves 97 % accuracy estimating the latent block representation as the community structure changes. This is demonstrated for synthetic graphs generated by a hybrid mixed-model stochastic block model from the DARPAIMIT Graph Challenge with time-varying block structure.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116732791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Fast Graph Algorithms for Superpixel Segmentation 超像素分割的快速图算法
2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926359
D. Floros, Tiancheng Liu, N. Pitsianis, Xiaobai Sun
{"title":"Fast Graph Algorithms for Superpixel Segmentation","authors":"D. Floros, Tiancheng Liu, N. Pitsianis, Xiaobai Sun","doi":"10.1109/HPEC55821.2022.9926359","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926359","url":null,"abstract":"We introduce the novel graph-based algorithm SLAM (simultaneous local assortative mixing) for fast and high-quality superpixel segmentation of any large color image. Super-pixels are compact semantic image elements; superpixel segmen-tation is fundamental to a broad range of vision tasks in existing and emerging applications, especially, to safety-critical and time-critical applications. SLAM leverages a graph representation of the image, which encodes the pixel features and similarities, for its rich potential in implicit feature transformation and extra means for feature differentiation and association at multiple resolution scales. We demonstrate, with our experimental results on 500 benchmark images, that SLAM outperforms the state-of-art algorithms in superpixel quality, by multiple measures, within the same time frame. The contributions are at least two-fold: SLAM breaks down the long-standing speed barriers in graph-based algorithms for superpixel segmentation; it lifts the fundamental limitations in the feature-point-based algorithms.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116863963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deep Gaussian process with multitask and transfer learning for performance optimization 基于多任务和迁移学习的深度高斯过程性能优化
2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926396
Wissam M. Sid-Lakhdar, M. Aznaveh, P. Luszczek, J. Dongarra
{"title":"Deep Gaussian process with multitask and transfer learning for performance optimization","authors":"Wissam M. Sid-Lakhdar, M. Aznaveh, P. Luszczek, J. Dongarra","doi":"10.1109/HPEC55821.2022.9926396","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926396","url":null,"abstract":"We combine Deep Gaussian Processes with multitask and transfer learning for the performance modeling and optimization of HPC applications. Deep Gaussian processes merge the uncertainty quantification advantage of Gaussian Processes with the predictive power of deep learning. Multitask and transfer learning allow for improved learning efficiency when several similar tasks are to be learned simultaneously and when previous learned models are sought to help in the learning of new tasks, respectively. A comparison with state-of-the-art autotuners shows the advantage of our approach on two application problems.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117137552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信