2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)最新文献_第2页

Machine Learning Aided Hardware Resource Estimation for FPGA DNN Implementations 机器学习辅助FPGA DNN实现的硬件资源估计

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00022

D. Diaconu, L. Petrica, Michaela Blott, M. Leeser

{"title":"Machine Learning Aided Hardware Resource Estimation for FPGA DNN Implementations","authors":"D. Diaconu, L. Petrica, Michaela Blott, M. Leeser","doi":"10.1109/IPDPSW55747.2022.00022","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00022","url":null,"abstract":"This paper explores methods of improving hardware resource estimation for the implementation of Deep Neural Networks(DNN) on FPGAs using machine learning algorithms. Current approaches consider the DNN and High Level Synthesis (HLS) levels. At the DNN level, most techniques are strictly analytical, and based on rough approximations and FPGA DNN implementation assumptions. The aim of this work is to facilitate design space exploration by providing more accurate resource estimates before running time consuming processes such as High Level Synthesis (HLS) or logic synthesis. We integrated the algorithms in FINN, an end-to-end framework for building Quantized Neural Networks (QNN) FPGA inference accelerators, in order to evaluate and compare them to existing estimation as well as the actual synthesized design. We generated Support Vector Regression (SVR) models for LUT and BRAM estimation, the former yields promising results, while the latter consistently underperforms in comparison to HLS and analytical FINN estimates. Combining the analytical approach used in FINN with SVR LUT estimation provided more accurate results because on its own, SVR had insufficient extrapolation capability. For BRAM estimation, we improved the analytical approach by using a Decision Tree Classifier for predicting distributed or BRAM memory implementation.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127716891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Families of Butterfly Counting Algorithms for Bipartite Graphs 二部图的蝴蝶计数算法族

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00060

Jay A. Acosta, Tze Meng Low, D. Parikh

引用次数: 4

A Scalable Pipeline for Gigapixel Whole Slide Imaging Analysis on Leadership Class HPC Systems 领导力级HPC系统上用于十亿像素整张幻灯片成像分析的可扩展流水线

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00223

Sajal Dash, Benjamín Hernández, A. Tsaris, Folami T. Alamudun, Hong-Jun Yoon, Feiyi Wang

{"title":"A Scalable Pipeline for Gigapixel Whole Slide Imaging Analysis on Leadership Class HPC Systems","authors":"Sajal Dash, Benjamín Hernández, A. Tsaris, Folami T. Alamudun, Hong-Jun Yoon, Feiyi Wang","doi":"10.1109/IPDPSW55747.2022.00223","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00223","url":null,"abstract":"Whole Slide Imaging (WSI) captures microscopic details of a patient's histopathological features at multiple res-olutions organized across different levels. Images produced by WSI are gigapixel-sized, and saving a single image in memory requires a few gigabytes which is scarce since a complicated model occupies tens of gigabytes. Performing a simple met-ric operation on these large images is also expensive. High-performance computing (HPC) can help us quickly analyze such large images using distributed training of complex deep learning models. One popular approach in analyzing these images is to divide a WSI image into smaller tiles (patches) and then train a simpler model with these reduced-sized but large numbers of patches. However, we need to solve three pre-processing challenges efficiently for pursuing this patch-based approach. 1) Creating small patches from a high-resolution image can result in a high number (hundreds of thousands per image) of patches. Storing and processing these images can be challenging due to a large number of I/O and arithmetic operations. To reduce I/Oand memory accesses, an optimal balance between the size and number of patches must exist to reduce I/O and memory accesses. 2) WSI images may have tiny annotated regions for cancer tissue and a significant portion with normal and fatty tissues; correct patch sampling should avoid dataset imbalance. 3) storing and retrieving many patches to and from disk storage might incur I/O latency while training a deep learning model. An efficient distributed data loader should reduce I/O latency during the training and inference steps. This paper explores these three challenges and provides empirical and algorithmic solutions deployed on the Summit supercomputer hosted at the Oak Ridge Leadership Computing Facility.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127807227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

4th Workshop on Parallel AI and Systems for the Edge 第四届并行AI和边缘系统研讨会

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00156

引用次数: 0

Parallel Bayesian Optimization for Optimal Scheduling of Underground Pumped Hydro-Energy Storage Systems 地下抽水蓄能系统优化调度的并行贝叶斯优化

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00133

M. Gobert, Jan Gmys, J. Toubeau, N. Melab, D. Tuyttens, F. Vallée

引用次数: 0

Message from the HCW Steering Committee Chair HCW指导委员会主席致辞

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/ipdps.2007.370321

Shirazi Behrooz

引用次数: 0

Accelerating SLIDE: Exploiting Sparsity on Accelerator Architectures 加速滑动:利用加速器架构上的稀疏性

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00116

Sho Ko, Alexander Rucker, Yaqi Zhang, Paul Mure, K. Olukotun

{"title":"Accelerating SLIDE: Exploiting Sparsity on Accelerator Architectures","authors":"Sho Ko, Alexander Rucker, Yaqi Zhang, Paul Mure, K. Olukotun","doi":"10.1109/IPDPSW55747.2022.00116","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00116","url":null,"abstract":"A significant trend in machine learning is sparsifying the training of neural networks to reduce the amount of computation required. Algorithms like Sub-LInear Deep learning Engine (SLIDE) [2] use locality-sensitive hashing (LSH) to create sparsity. These sparse training algorithms were originally developed on multi-threaded multicore CPUs. However, they are not well-studied and optimized for accelerator platforms such as GPUs and reconfigurable dataflow architectures (RDAs). In this paper, we study the different variants of the SLIDE algorithm and investigate accuracy-performance tradeoffs on CPU, GPU, and RDAs. The implementation targeting RDA outperforms the GPU by 7.5×. The performance on a limited-memory RDA is improved further by proposing a smart caching algorithm, which is 2 × faster than the baseline RDA. Furthermore, we are able to achieve another 2 × performance by putting all of the weights on-chip using an RDA with enough memory. We believe our work will pave the road for the future development of both algorithm and hardware architecture for sparse training.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131876018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Research-Based Course Module to Study Non-determinism in High Performance Applications 研究高性能应用中的不确定性的研究型课程模块

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00067

Patrick Bell, Kae Suarez, Barbara Fossum, Dylan Chapp, S. Bhowmick, M. Taufer

引用次数: 0

RAW 2022 Keynote Speaker 1: Using FPGAs in datacenters and the cloud RAW 2022主题演讲者1:在数据中心和云中使用fpga

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00020

G. Alonso

引用次数: 0

Energy-aware neural architecture selection and hyperparameter optimization 能量感知神经结构选择与超参数优化

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00125

Nathan C Frey, Dan Zhao, Simon Axelrod, Michael Jones, David Bestor, V. Gadepally, Rafael Gómez-Bombarelli, S. Samsi

{"title":"Energy-aware neural architecture selection and hyperparameter optimization","authors":"Nathan C Frey, Dan Zhao, Simon Axelrod, Michael Jones, David Bestor, V. Gadepally, Rafael Gómez-Bombarelli, S. Samsi","doi":"10.1109/IPDPSW55747.2022.00125","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00125","url":null,"abstract":"Artificial Intelligence (AI) and Deep Learning in particular have increasing computational requirements, with a corresponding increase in energy consumption. There is a tremendous opportunity to reduce the computational cost and environmental impact of deep learning by accelerating neural network architecture search and hyperparameter optimization, as well as explicitly designing neural architectures that optimize for both energy efficiency and performance. Here, we introduce a framework called training performance estimation (TPE), which builds upon existing techniques for training speed estimation in order to monitor energy consumption and rank model performance-without training models to convergence-saving up to 90% of time and energy of the full training budget. We benchmark TPE in the computationally intensive, well-studied domain of computer vision and in the emerging field of graph neural networks for machine-learned inter-atomic potentials, an important domain for scientific discovery with heavy computational demands. We propose variants of early stopping that generalize this common regularization technique to account for energy costs and study the energy costs of deploying increasingly complex, knowledge-informed architectures for AI-accelerated molecular dynamics and image classification. Our work enables immediate, significant energy savings across the entire pipeline of model development and deployment and suggests new research directions for energy-aware, knowledge-informed model architecture development.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"26 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114163100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3