2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)最新文献

筛选
英文 中文
Machine Learning Aided Hardware Resource Estimation for FPGA DNN Implementations 机器学习辅助FPGA DNN实现的硬件资源估计
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00022
D. Diaconu, L. Petrica, Michaela Blott, M. Leeser
{"title":"Machine Learning Aided Hardware Resource Estimation for FPGA DNN Implementations","authors":"D. Diaconu, L. Petrica, Michaela Blott, M. Leeser","doi":"10.1109/IPDPSW55747.2022.00022","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00022","url":null,"abstract":"This paper explores methods of improving hardware resource estimation for the implementation of Deep Neural Networks(DNN) on FPGAs using machine learning algorithms. Current approaches consider the DNN and High Level Synthesis (HLS) levels. At the DNN level, most techniques are strictly analytical, and based on rough approximations and FPGA DNN implementation assumptions. The aim of this work is to facilitate design space exploration by providing more accurate resource estimates before running time consuming processes such as High Level Synthesis (HLS) or logic synthesis. We integrated the algorithms in FINN, an end-to-end framework for building Quantized Neural Networks (QNN) FPGA inference accelerators, in order to evaluate and compare them to existing estimation as well as the actual synthesized design. We generated Support Vector Regression (SVR) models for LUT and BRAM estimation, the former yields promising results, while the latter consistently underperforms in comparison to HLS and analytical FINN estimates. Combining the analytical approach used in FINN with SVR LUT estimation provided more accurate results because on its own, SVR had insufficient extrapolation capability. For BRAM estimation, we improved the analytical approach by using a Decision Tree Classifier for predicting distributed or BRAM memory implementation.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127716891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Families of Butterfly Counting Algorithms for Bipartite Graphs 二部图的蝴蝶计数算法族
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00060
Jay A. Acosta, Tze Meng Low, D. Parikh
{"title":"Families of Butterfly Counting Algorithms for Bipartite Graphs","authors":"Jay A. Acosta, Tze Meng Low, D. Parikh","doi":"10.1109/IPDPSW55747.2022.00060","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00060","url":null,"abstract":"Butterflies are an important motif found in bipartite graphs that provide a structural way for finding dense regions within the graph. Beyond counting butterflies and enumerating them, other metrics and peeling for bipartite graphs are designed around counting butterfly motifs. The importance of counting butterflies has led to many works on efficient implementations for butterfly counting, given certain situational or hardware constraints. However, most algorithms are based on first counting the building block of the butterfly motif, and from that calculating the total possible number of butterflies in the graph. In this paper, using a linear algebra approach, we show that many provably correct algorithms for counting butterflies can be systematically derived. Moreover, we show how this formulation facilitates butterfly peeling algorithms that find the k-tip and k-wing subgraphs within a bipartite graph.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130738722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Scalable Pipeline for Gigapixel Whole Slide Imaging Analysis on Leadership Class HPC Systems 领导力级HPC系统上用于十亿像素整张幻灯片成像分析的可扩展流水线
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00223
Sajal Dash, Benjamín Hernández, A. Tsaris, Folami T. Alamudun, Hong-Jun Yoon, Feiyi Wang
{"title":"A Scalable Pipeline for Gigapixel Whole Slide Imaging Analysis on Leadership Class HPC Systems","authors":"Sajal Dash, Benjamín Hernández, A. Tsaris, Folami T. Alamudun, Hong-Jun Yoon, Feiyi Wang","doi":"10.1109/IPDPSW55747.2022.00223","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00223","url":null,"abstract":"Whole Slide Imaging (WSI) captures microscopic details of a patient's histopathological features at multiple res-olutions organized across different levels. Images produced by WSI are gigapixel-sized, and saving a single image in memory requires a few gigabytes which is scarce since a complicated model occupies tens of gigabytes. Performing a simple met-ric operation on these large images is also expensive. High-performance computing (HPC) can help us quickly analyze such large images using distributed training of complex deep learning models. One popular approach in analyzing these images is to divide a WSI image into smaller tiles (patches) and then train a simpler model with these reduced-sized but large numbers of patches. However, we need to solve three pre-processing challenges efficiently for pursuing this patch-based approach. 1) Creating small patches from a high-resolution image can result in a high number (hundreds of thousands per image) of patches. Storing and processing these images can be challenging due to a large number of I/O and arithmetic operations. To reduce I/Oand memory accesses, an optimal balance between the size and number of patches must exist to reduce I/O and memory accesses. 2) WSI images may have tiny annotated regions for cancer tissue and a significant portion with normal and fatty tissues; correct patch sampling should avoid dataset imbalance. 3) storing and retrieving many patches to and from disk storage might incur I/O latency while training a deep learning model. An efficient distributed data loader should reduce I/O latency during the training and inference steps. This paper explores these three challenges and provides empirical and algorithmic solutions deployed on the Summit supercomputer hosted at the Oak Ridge Leadership Computing Facility.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127807227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
4th Workshop on Parallel AI and Systems for the Edge 第四届并行AI和边缘系统研讨会
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00156
{"title":"4th Workshop on Parallel AI and Systems for the Edge","authors":"","doi":"10.1109/IPDPSW55747.2022.00156","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00156","url":null,"abstract":"","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122702352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parallel Bayesian Optimization for Optimal Scheduling of Underground Pumped Hydro-Energy Storage Systems 地下抽水蓄能系统优化调度的并行贝叶斯优化
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00133
M. Gobert, Jan Gmys, J. Toubeau, N. Melab, D. Tuyttens, F. Vallée
{"title":"Parallel Bayesian Optimization for Optimal Scheduling of Underground Pumped Hydro-Energy Storage Systems","authors":"M. Gobert, Jan Gmys, J. Toubeau, N. Melab, D. Tuyttens, F. Vallée","doi":"10.1109/IPDPSW55747.2022.00133","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00133","url":null,"abstract":"Underground Pumped Hydro-Energy Storage stations are sustainable options to enhance storage capacity and thus the flexibility of energy systems. Efficient management of such units requires high-performance optimization algorithms able to find solutions in a very restricted timing to comply with the responsive energy markets. In this context, parallel computing offers a valuable solution to ensure appropriate decisions that maximize the profit of the station operator, while guaranteeing the safety of the energy network. This study investigates the use of three existing algorithms in Parallel Bayesian Optimization, namely q-EGO, BSP-EGO and TuRBO. The three algorithms have different inherent behaviors in terms of parallel potential and, even though TuRBO scales better, q-EGO remains the best choice regarding the final outcomes for all investigated batch sizes and manages to get up to 5 times more profits than other approaches.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131148711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Message from the HCW Steering Committee Chair HCW指导委员会主席致辞
Shirazi Behrooz
{"title":"Message from the HCW Steering Committee Chair","authors":"Shirazi Behrooz","doi":"10.1109/ipdps.2007.370321","DOIUrl":"https://doi.org/10.1109/ipdps.2007.370321","url":null,"abstract":"These are the proceedings of the “28th Heterogeneity in Computing Workshop,” also known as HCW 2019. A few years ago, the title of the workshop was changed from the original title of “Heterogeneous Computing Workshop” to reflect the breadth of the impact of heterogeneity, as well as to stress that the focus of the workshop is on the management and exploitation of heterogeneity. All of this is, of course, taken in the context of the parent conference, the International Parallel and Distributed Processing Symposium (IPDPS), and so explores heterogeneity in parallel and distributed computing systems.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"16 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131436352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating SLIDE: Exploiting Sparsity on Accelerator Architectures 加速滑动:利用加速器架构上的稀疏性
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00116
Sho Ko, Alexander Rucker, Yaqi Zhang, Paul Mure, K. Olukotun
{"title":"Accelerating SLIDE: Exploiting Sparsity on Accelerator Architectures","authors":"Sho Ko, Alexander Rucker, Yaqi Zhang, Paul Mure, K. Olukotun","doi":"10.1109/IPDPSW55747.2022.00116","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00116","url":null,"abstract":"A significant trend in machine learning is sparsifying the training of neural networks to reduce the amount of computation required. Algorithms like Sub-LInear Deep learning Engine (SLIDE) [2] use locality-sensitive hashing (LSH) to create sparsity. These sparse training algorithms were originally developed on multi-threaded multicore CPUs. However, they are not well-studied and optimized for accelerator platforms such as GPUs and reconfigurable dataflow architectures (RDAs). In this paper, we study the different variants of the SLIDE algorithm and investigate accuracy-performance tradeoffs on CPU, GPU, and RDAs. The implementation targeting RDA outperforms the GPU by 7.5×. The performance on a limited-memory RDA is improved further by proposing a smart caching algorithm, which is 2 × faster than the baseline RDA. Furthermore, we are able to achieve another 2 × performance by putting all of the weights on-chip using an RDA with enough memory. We believe our work will pave the road for the future development of both algorithm and hardware architecture for sparse training.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131876018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Research-Based Course Module to Study Non-determinism in High Performance Applications 研究高性能应用中的不确定性的研究型课程模块
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00067
Patrick Bell, Kae Suarez, Barbara Fossum, Dylan Chapp, S. Bhowmick, M. Taufer
{"title":"A Research-Based Course Module to Study Non-determinism in High Performance Applications","authors":"Patrick Bell, Kae Suarez, Barbara Fossum, Dylan Chapp, S. Bhowmick, M. Taufer","doi":"10.1109/IPDPSW55747.2022.00067","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00067","url":null,"abstract":"We present a research-based course module to teach computer science students, software developers, and scientists the effects of non-determinism on high performance applications. The course module uses the ANACIN-X software package, a suite of software modules developed by the authors; ANACIN-X provides test cases, analytic tools to run different scenarios (e.g., using different numbers of processes and different communication patterns), and visualization tools for beginner, intermediate, and advanced level understandings in non-determinism. Through our course module, students in computer science, software developers, and scientists gain an understanding of non-determinism, how to measure its occurrence in an execution, and how to identify its root causes within an application's code.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"16 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133076882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RAW 2022 Keynote Speaker 1: Using FPGAs in datacenters and the cloud RAW 2022主题演讲者1:在数据中心和云中使用fpga
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00020
G. Alonso
{"title":"RAW 2022 Keynote Speaker 1: Using FPGAs in datacenters and the cloud","authors":"G. Alonso","doi":"10.1109/IPDPSW55747.2022.00020","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00020","url":null,"abstract":"Several trends in the IT industry are driving an increasing specialization of the hardware layers. On the one hand, demanding workloads, large data volumes, diversity in data types, etc. are all factors contributing to make general purpose computing too inefficient. On the other hand, cloud computing and its economies of scale allow vendors to invest on specialized hardware for particular tasks that otherwise would be too expensive or consume resources needed elsewhere. In this talk I will discuss the shift towards hardware acceleration and show with several examples from industry and from research the large role that FPGAs could play.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115643788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Energy-aware neural architecture selection and hyperparameter optimization 能量感知神经结构选择与超参数优化
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00125
Nathan C Frey, Dan Zhao, Simon Axelrod, Michael Jones, David Bestor, V. Gadepally, Rafael Gómez-Bombarelli, S. Samsi
{"title":"Energy-aware neural architecture selection and hyperparameter optimization","authors":"Nathan C Frey, Dan Zhao, Simon Axelrod, Michael Jones, David Bestor, V. Gadepally, Rafael Gómez-Bombarelli, S. Samsi","doi":"10.1109/IPDPSW55747.2022.00125","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00125","url":null,"abstract":"Artificial Intelligence (AI) and Deep Learning in particular have increasing computational requirements, with a corresponding increase in energy consumption. There is a tremendous opportunity to reduce the computational cost and environmental impact of deep learning by accelerating neural network architecture search and hyperparameter optimization, as well as explicitly designing neural architectures that optimize for both energy efficiency and performance. Here, we introduce a framework called training performance estimation (TPE), which builds upon existing techniques for training speed estimation in order to monitor energy consumption and rank model performance-without training models to convergence-saving up to 90% of time and energy of the full training budget. We benchmark TPE in the computationally intensive, well-studied domain of computer vision and in the emerging field of graph neural networks for machine-learned inter-atomic potentials, an important domain for scientific discovery with heavy computational demands. We propose variants of early stopping that generalize this common regularization technique to account for energy costs and study the energy costs of deploying increasingly complex, knowledge-informed architectures for AI-accelerated molecular dynamics and image classification. Our work enables immediate, significant energy savings across the entire pipeline of model development and deployment and suggests new research directions for energy-aware, knowledge-informed model architecture development.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"26 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114163100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信