2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)最新文献_第5页

Essentials of Parallel Graph Analytics 并行图分析的要点

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00061

M. Osama, Serban D. Porumbescu, John Douglas Owens

引用次数: 8

EDAML 2022 Invited Speaker 2: AI Algorithm and Accelerator Co-design for Computing on the Edge EDAML 2022特邀演讲者2:边缘计算的人工智能算法和加速器协同设计

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00195

Deming Chen

引用次数: 0

A Coarse Grained Reconfigurable Architecture for SHA-2 Acceleration SHA-2加速的粗粒度可重构体系结构

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00117

H. Pham, T. Tran, Luc Duong, Y. Nakashima

{"title":"A Coarse Grained Reconfigurable Architecture for SHA-2 Acceleration","authors":"H. Pham, T. Tran, Luc Duong, Y. Nakashima","doi":"10.1109/IPDPSW55747.2022.00117","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00117","url":null,"abstract":"The development of high-speed SHA-2 hardware with high flexibility is urgently needed because SHA-2 functions are widely employed in numerous fields, from loT devices to cryp-to currency. Unfortunately, the existing SHA-2 circuits have difficulty in achieving high flexibility and hardware efficiency. Therefore, this paper proposes a coarse-grained reconfigurable architecture (CGRA) for accelerating SHA-2 computation, named a CGRA SHA-2 accelerator. To effectively support various algorithms and requirements, three optimization techniques are proposed to achieve high flexibility and hardware efficiency. First, an on-demand pro-cessing element array is proposed to enable flexible computation for long and short messages. Second, a dual-ALU processing element (D-PE) is proposed to compute various SHA-2 functions. Third, the pipelined dual-ALU architecture is proposed to reduce the critical paths, leading to remarkably improved performance and hardware efficiency. The accuracy of our proposed accelerator is verified on a real hardware platform (the Xilinx Alveo U280 FPGA). Besides, the experimental results on several FPGAs prove that the proposed CGRA SHA-2 accelerator is significantly higher performance, hardware efficiency, and flexibility than existing works.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117013960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

HCW 2022 Keynote Speaker: Heterogeneous Computing for Scientific Machine Learning HCW 2022主题演讲:面向科学机器学习的异构计算

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00011

L. White

{"title":"HCW 2022 Keynote Speaker: Heterogeneous Computing for Scientific Machine Learning","authors":"L. White","doi":"10.1109/IPDPSW55747.2022.00011","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00011","url":null,"abstract":"More than ever, the semiconductor industry is asked to answer society's call for more computing capacity and capability, which are driven by rapid digitalization, the widespread adoption of artificial intelligence, and the ever-increasing need for high-fidelity scientific simulations. While facing high demand, the supply of computing capability is being technically challenged by the slowdown of Moore's law and the need for high energy efficiency. This tug-of-war has now pushed the industry towards domain-specific accelerators, perhaps likely past the point of no return. The mix of general-purpose CPUs and high-end GPGPUs, which has pervaded data centers over the past few years, is likely to be expanded to a much richer set of application-specific accelerators, including AI engines, reconfigurable hardware, and even perhaps quantum, annealing, and neuromorphic devices. While acceleration and better efficiency may be enabled by using domain-specific accelerators for selected workloads, a much more holistic (i.e., system-wide) approach will have to be adopted to achieve significant performance gains for complex applications that consist of a variety of workloads where each could benefit from a specific accelerator. As an important example, scientific computing, which increasingly incorporates AI training and inference kernels in a tightly-integrated fashion, provides a rich and exciting laboratory for addressing the challenges of efficiently using highly-heterogeneous systems and for ultimately realizing their promises. Those challenges include co-designing the application, which requires domain experts to collaborate with other experts across the stack for workload mapping and data orchestration, and also adopting a decentralized strategy that embeds processing units where the data need them. Finally, the early experience of those co-design efforts should help the industry devise a longer-term strategy for developing programming models that would relieve application experts from what is often perceived as the burden of hardwareaware development and code optimization.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114330002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

When and How to Retrain Machine Learning-based Cloud Management Systems 何时以及如何重新培训基于机器学习的云管理系统

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00120

Lidia Kidane, P. Townend, Thijs Metsch, E. Elmroth

{"title":"When and How to Retrain Machine Learning-based Cloud Management Systems","authors":"Lidia Kidane, P. Townend, Thijs Metsch, E. Elmroth","doi":"10.1109/IPDPSW55747.2022.00120","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00120","url":null,"abstract":"Cloud management systems increasingly rely on machine learning (ML) models to predict incoming workload rates, load, and other system behaviours for efficient dynamic resource management. Current state-of-the-art prediction models demonstrate high accuracy but assume that data patterns remain stable. However, in production use, systems may face hardware upgrades, changes in user behaviour etc. that lead to concept drifts - significant changes in the characteristics of data streams over time. To mitigate prediction deterioration, ML models need to be updated - but questions of when and how to best retrain these models are unsolved in the context of cloud management. We present a pilot study that addresses these questions for one of the most common models for adaptive prediction - Long Short Term Memory (LSTM) - using synthetic and real-world workload data. Our analysis of when to retrain explores approaches for detecting when retraining is required using both concept drift detection and prediction error thresholds, and at what point retraining should actually take place. Our analysis of how to retrain focuses on the data required for retraining, and what proportion should be taken from before and after the need for retraining is detected. We present initial results that indicate that retraining of existing models can achieve prediction accuracy close to that of newly trained models but for much less cost, and present initial advice for how to provide cloud management systems with support for automatic retraining of ML-based models.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114496301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Green(er) World for A.I. 人工智能的绿色世界

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00126

Dan Zhao, Nathan C Frey, Joseph McDonald, M. Hubbell, David Bestor, Michael Jones, Andrew Prout, V. Gadepally, S. Samsi

{"title":"A Green(er) World for A.I.","authors":"Dan Zhao, Nathan C Frey, Joseph McDonald, M. Hubbell, David Bestor, Michael Jones, Andrew Prout, V. Gadepally, S. Samsi","doi":"10.1109/IPDPSW55747.2022.00126","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00126","url":null,"abstract":"As research and practice in artificial intelligence (A.I.) grow in leaps and bounds, the resources necessary to sustain and support their operations also grow at an increasing pace. While innovations and applications from A.I. have brought significant advances, from applications to vision and natural language to improvements to fields like medical imaging and materials engineering, their costs should not be neglected. As we embrace a world with ever-increasing amounts of data as well as research & development of A.I. applications, we are sure to face an ever-mounting energy footprint to sustain these computational budgets, data storage needs, and more. But, is this sustainable and, more importantly, what kind of setting is best positioned to nurture such sustainable A.I. in both research and practice? In this paper, we outline our outlook for Green A.I.—a more sustainable, energy-efficient and energy-aware ecosystem for developing A.I. across the research, computing, and practitioner communities alike—and the steps required to arrive there. We present a bird's eye view of various areas for potential changes and improvements from the ground floor of AI's operational and hardware optimizations for datacenter/HPCs to the current incentive structures in the world of A.I. research and practice, and more. We hope these points will spur further discussion, and action, on some of these issues and their potential solutions.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115756157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

AsHES 2022 Keynote Speaker: The Modular Supercomputing Architecture (MSA) 骨灰2022主题演讲:模块化超级计算架构(MSA)

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00071

E. Suarez

引用次数: 0

EDAML 2022 Invited Speaker 5: Combining Optimization and Machine Learning in Physical Design EDAML 2022特邀演讲嘉宾5:在物理设计中结合优化和机器学习

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00198

L. Behjat

{"title":"EDAML 2022 Invited Speaker 5: Combining Optimization and Machine Learning in Physical Design","authors":"L. Behjat","doi":"10.1109/IPDPSW55747.2022.00198","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00198","url":null,"abstract":"The exponential increase in computing power and the availability of big data have ignited innovations in EDA. The most recent trend in innovations has involved using machine learning algorithms for solving problems of scale. Machine learning techniques can solve large-scale problems efficiently once they are trained. However, their training takes a large amount of computing power and might not translate well from one type of problem to another. On the other hand, many of the existing algorithms in physical design take advantage of mathematical optimization techniques to improve their solution quality. These techniques can find optimal or near-optimal solutions using fast heuristics. These techniques do not require a large amount of data but need some level of insight into the nature of the problem by the designer. The mathematical optimization techniques rely heavily on the developed models. In this talk, we will discuss how machine learning can be used to develop better models for optimization problems and how optimization techniques can then use the models to generate more data to improve the accuracy and robustness of machine learning techniques. We will first discuss the algorithm-driven nature of the optimization techniques and compare that to the data-driven nature of the machine learning techniques. We will use examples of physical design placement and routing. Then, we will discuss how optimization and ML can be used to solve the problems of scale both in numbers and transistor sizes. We will also discuss how reinforcement learning can be used to come up with new heuristics for solving the problems encountered in physical design. The talk will end with some practical suggestions on how to improve the quality and speed of the design.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126971479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Synchronous parallel multisplitting method with convergence acceleration using a local Krylov-based minimization for solving linear systems 线性系统的局部krylovv最小化同步并行多重分裂加速收敛方法

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00146

Médane A. Tchakorom, R. Couturier, Jean-Claude Charr

{"title":"Synchronous parallel multisplitting method with convergence acceleration using a local Krylov-based minimization for solving linear systems","authors":"Médane A. Tchakorom, R. Couturier, Jean-Claude Charr","doi":"10.1109/IPDPSW55747.2022.00146","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00146","url":null,"abstract":"Computer simulations of physical phenomena, such as heat transfer, often require the solution of linear equations. These linear equations occur in the form Ax $=mathbf{b}$, where A is a matrix, $mathbf{b}$ is a vector, and $mathbf{x}$ is the vector of unknowns. Iterative methods are the most adapted to solve large linear systems because they can be easily parallelized. This paper presents a variant of the multisplitting iterative method with convergence acceleration using the Krylov-based minimization method. This paper particularly focuses on improving the convergence speed of the method with an implementation based on the PETSc (Portable Extensible Toolkit for Scientific Computation) library. This was achieved by reducing the need for synchronization - data exchange - during the minimization process and adding a preconditioner before the multisplitting method. All experiments were performed either over one or two sites of the Grid5000 platform and up to 128 cores were used. The results for solving a 2D Laplacian problem of size 10242 components, show a speed up of up to 23X and 86X when respectively compared to the algorithm in [8] and to the general multisplitting implementation.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124914648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive Optimization for Sparse Data on Heterogeneous GPUs 异构gpu上稀疏数据的自适应优化

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00177

Yujing Ma, Florin Rusu, Kesheng Wu, A. Sim

{"title":"Adaptive Optimization for Sparse Data on Heterogeneous GPUs","authors":"Yujing Ma, Florin Rusu, Kesheng Wu, A. Sim","doi":"10.1109/IPDPSW55747.2022.00177","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00177","url":null,"abstract":"Motivated by extreme multi-label classification applications, we consider training deep learning models over sparse data in multi-GPU servers. The variance in the number of non-zero features across training batches and the intrinsic GPU heterogeneity combine to limit accuracy and increase the time to convergence. We address these challenges with Adaptive SGD, an adaptive elastic model averaging stochastic gradient descent algorithm for heterogeneous multi-GPUs that is characterized by dynamic scheduling, adaptive batch size scaling, and normalized model merging. Instead of statically partitioning batches to GPUs, batches are routed based on the relative processing speed. Batch size scaling assigns larger batches to the faster GPUs and smaller batches to the slower ones, with the goal to arrive at a steady state in which all the GPUs perform the same number of model updates. Normalized model merging computes optimal weights for every GPU based on the assigned batches such that the combined model achieves better accuracy. We show experimentally that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy and is scalable with the number of GPUs.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"190 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121624075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0