2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)最新文献

Towards Pervasive Containerization of HPC Job Schedulers 迈向高性能计算作业调度器的普及容器化

2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2020-09-01 DOI: 10.1109/SBAC-PAD49847.2020.00046

C. Cérin, Nicolas Grenèche, Tarek Menouer

{"title":"Towards Pervasive Containerization of HPC Job Schedulers","authors":"C. Cérin, Nicolas Grenèche, Tarek Menouer","doi":"10.1109/SBAC-PAD49847.2020.00046","DOIUrl":"https://doi.org/10.1109/SBAC-PAD49847.2020.00046","url":null,"abstract":"In cloud computing, elasticity is defined as \"the degree to which a system is able to adapt to workload changes by provisioning and de-provisioning resources in an autonomic manner, such that at each point in time the available resources match the current demand as closely as possible\". Adding elasticity to HPC (High Performance Computing) clusters management systems remains challenging even if we deploy such HPC systems in today's cloud environments. This difficulty is caused by the fact that HPC jobs scheduler needs to rely on a fixed set of resources. Every change of topology (adding or removing computing resources) leads to a global restart of the HPC jobs scheduler. This phenomenon is not a major drawback because it provides a very effective way of sharing a fixed set of resources but we think that it could be complemented by a more elastic approach. Moreover, the elasticity issue should not be reduced to the scaling of resources issues. Clouds also enable access to various technologies that enhance the services offer to users. In this paper, our approach is to use containers technology to instantiate a tailored HPC environment based on the user's reservation constraints. We claim that the introduction and use of containers in HPC job schedulers allow better management of resources, in a more economical way. From the use case of SLURM, we release a methodology for 'containerization' of HPC jobs schedulers which is pervasive i.e. spreading widely throughout any layers of job schedulers. We also provide initial experiments demonstrating that our containerized SLURM system is operational and promising.","PeriodicalId":202581,"journal":{"name":"2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116428604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

A Robotic Communication Middleware Combining High Performance and High Reliability 一种高性能与高可靠性相结合的机器人通信中间件

2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2020-09-01 DOI: 10.1109/SBAC-PAD49847.2020.00038

Wei Liu, Hao Wu, Ziyue Jiang, Yifan Gong, Jiangming Jin

{"title":"A Robotic Communication Middleware Combining High Performance and High Reliability","authors":"Wei Liu, Hao Wu, Ziyue Jiang, Yifan Gong, Jiangming Jin","doi":"10.1109/SBAC-PAD49847.2020.00038","DOIUrl":"https://doi.org/10.1109/SBAC-PAD49847.2020.00038","url":null,"abstract":"With the significant advances of AI technology, intelligent robotic systems have achieved remarkable development and profound effects. To enable massive data transmissionin an efficient and reliable way, both high performance andhigh reliability should be taken into account in system design. However, the conventional communication middleware used in the majority of autonomous robotic systems, is based on socked-based methods, which always lead to high latency. Moreover, some sophisticated communication middleware utilizes shared memory upon ring buffers for high performance without consideration of the reliability. To obtain both high performance and high reliability, we employ shared memory for performance improvement and propose a novel socket-based communication control algorithm to improve reliability during data transmission. Furthermore, based on the proposed algorithm, we implement a novel robotic communication middleware, named Robust-Z, combining both high performance and high reliability. Experimental results show that (1) Robust-Z is able to gain up to 41% and 5% performance improvement compared to ROS2 and Apollo CyberRT, respectively; (2) Robust-Z is able to provide crash safety and reduce 5.2% data missing rate compared with CyberRT.","PeriodicalId":202581,"journal":{"name":"2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128484342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

OmpTracing: Easy Profiling of OpenMP Programs comptracing: OpenMP程序的简单分析

2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2020-09-01 DOI: 10.1109/SBAC-PAD49847.2020.00042

Vitoria Pinho, H. Yviquel, M. Pereira, G. Araújo

引用次数: 0

Design Space Exploration of Accelerators and End-to-End DNN Evaluation with TFLITE-SOC 基于TFLITE-SOC的加速器设计空间探索和端到端DNN评估

2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2020-09-01 DOI: 10.1109/SBAC-PAD49847.2020.00013

Nicolas Bohm Agostini, Shi Dong, Elmira Karimi, Marti Torrents Lapuerta, José Cano, José L. Abellán, D. Kaeli

{"title":"Design Space Exploration of Accelerators and End-to-End DNN Evaluation with TFLITE-SOC","authors":"Nicolas Bohm Agostini, Shi Dong, Elmira Karimi, Marti Torrents Lapuerta, José Cano, José L. Abellán, D. Kaeli","doi":"10.1109/SBAC-PAD49847.2020.00013","DOIUrl":"https://doi.org/10.1109/SBAC-PAD49847.2020.00013","url":null,"abstract":"Recently there has been a rapidly growing demand for faster machine learning (ML) processing in data centers and migration of ML inference applications to edge devices. These developments have prompted both industry and academia to explore custom accelerators to optimize ML executions for performance and power. However, identifying which accelerator is best equipped for performing a particular ML task is challenging, especially given the growing range of ML tasks, the number of target environments, and the limited number of integrated modeling tools. To tackle this issue, it is of paramount importance to provide the computer architecture research community with a common framework capable of performing a comprehensive, uniform, and fair comparison across different accelerator designs targeting a particular ML task. To this aim, we propose a new framework named TFLITE-SOC (System On Chip) that integrates a lightweight system modeling library (SystemC) for fast design space exploration of custom ML accelerators into the build/execution environment of Tensorflow Lite (TFLite), a highly popular ML framework for ML inference. Using this approach, we are able to model and evaluate new accelerators developed in SystemC by leveraging the language's hierarchical design capabilities, resulting in faster design prototyping. Furthermore, any accelerator designed using TFLITE-SOC can be benchmarked for inference with any DNN model compatible with TFLite, which enables end-to-end DNN processing and detailed (i.e., per DNN layer) performance analysis. In addition to providing rapid prototyping, integrated benchmarking, and a range of platform configurations, TFLITE-SOC offers comprehensive performance analysis of accelerator occupancy and execution time breakdown as well as a rich set of modules that can be used by new accelerators to implement scaling up studies and optimized memory transfer protocols. We present our framework and demonstrate its utility by considering the design space of a TPU-like systolic array and describing possible directions for optimization. Using a compression technique, we implement an optimization targeting reducing the memory traffic between DRAM and on-device buffers. Compared to the baseline accelerator, our optimized design shows up to 1.26x speedup on accelerated operations and up to 1.19x speedup on end-to-end DNN execution.","PeriodicalId":202581,"journal":{"name":"2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123368446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Reliable and Energy-aware Mapping of Streaming Series-parallel Applications onto Hierarchical Platforms 流串并联应用到分层平台的可靠和能量感知映射

2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2020-09-01 DOI: 10.1109/SBAC-PAD49847.2020.00026

Changjiang Gou, A. Benoit, Mingsong Chen, L. Marchal, Tongquan Wei

引用次数: 0

An Optimal Model for Optimizing the Placement and Parallelism of Data Stream Processing Applications on Cloud-Edge Computing 基于云边缘计算的数据流处理应用程序布局和并行性优化模型

2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2020-09-01 DOI: 10.1109/SBAC-PAD49847.2020.00019

Felipe Rodrigo de Souza, M. Assunção, E. Caron, A. Veith

{"title":"An Optimal Model for Optimizing the Placement and Parallelism of Data Stream Processing Applications on Cloud-Edge Computing","authors":"Felipe Rodrigo de Souza, M. Assunção, E. Caron, A. Veith","doi":"10.1109/SBAC-PAD49847.2020.00019","DOIUrl":"https://doi.org/10.1109/SBAC-PAD49847.2020.00019","url":null,"abstract":"The Internet of Things has enabled many application scenarios where a large number of connected devices generate unbounded streams of data, often processed by data stream processing frameworks deployed in the cloud. Edge computing enables offloading processing from the cloud and placing it close to where the data is generated, thereby reducing the time to process data events and deployment costs. However, edge resources are more computationally constrained than their cloud counterparts, raising two interrelated issues, namely deciding on the parallelism of processing tasks (a.k.a. operators) and their mapping onto available resources. In this work, we formulate the scenario of operator placement and parallelism as an optimal mixed-integer linear programming problem. The proposed model is termed as Cloud-Edge data Stream Placement (CESP). Experimental results using discrete-event simulation demonstrate that CESP can achieve an end-to-end latency at least ≃ 80% and monetary costs at least ≃ 30% better than traditional cloud deployment.","PeriodicalId":202581,"journal":{"name":"2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125755805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Performance Analysis and Optimization of the Vector-Kronecker Product Multiplication 向量-克罗内克积乘法的性能分析与优化

2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2020-09-01 DOI: 10.1109/SBAC-PAD49847.2020.00044

Alexandre Azevedo, C. Bentes, Maria Clicia Stelling de Castro, C. Tadonki

引用次数: 2

Scalable and Efficient Spatial-Aware Parallelization Strategies for Multimedia Retrieval 面向多媒体检索的可扩展高效空间感知并行化策略

2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2020-09-01 DOI: 10.1109/SBAC-PAD49847.2020.00027

Guilherme Andrade, George Teodoro, R. Ferreira

{"title":"Scalable and Efficient Spatial-Aware Parallelization Strategies for Multimedia Retrieval","authors":"Guilherme Andrade, George Teodoro, R. Ferreira","doi":"10.1109/SBAC-PAD49847.2020.00027","DOIUrl":"https://doi.org/10.1109/SBAC-PAD49847.2020.00027","url":null,"abstract":"Similarity search is a key operation in several multimedia applications, including online Content-Based Multimedia Retrieval (CBMR) services. These applications have to deal with very large databases and are submitted to high query rates. In this context, scalability in distributed memory system is critical to assemble the required computing power and memory space. However, we have identified that the Data Equal Split (DES) parallelization and associated data partition strategy employed by the related works on the domain have limitations in terms of efficiency and scalability. Therefore, in this paper, we developed and implemented a framework for similarity search execution on distributed memory machines and proposed a novel class of data partition strategies that takes into account the data spatial organization in its distribution. This approach leads to a reduction in communication traffic and in costs associated with processing each task in local searches carried out in the distributed machine. Our approach attained a speedup of 2.4× on top of DES in the baseline case (5 nodes) and also achieves higher scalability efficiency and is 14.5× faster when 160 nodes are used. In fact, our novel data organization led to superlinear scalability in all configurations evaluated.","PeriodicalId":202581,"journal":{"name":"2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124209536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Energy-Efficient Time Series Analysis Using Transprecision Computing 使用精确计算的节能时间序列分析

2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2020-09-01 DOI: 10.1109/SBAC-PAD49847.2020.00022

Ivan Fernandez, Ricardo Quislant, E. Gutiérrez, O. Plata

{"title":"Energy-Efficient Time Series Analysis Using Transprecision Computing","authors":"Ivan Fernandez, Ricardo Quislant, E. Gutiérrez, O. Plata","doi":"10.1109/SBAC-PAD49847.2020.00022","DOIUrl":"https://doi.org/10.1109/SBAC-PAD49847.2020.00022","url":null,"abstract":"Time series analysis is a key step in monitoring and predicting events over time in domains such as epidemiology, genomics, medicine, seismology, speech recognition, and economics. Matrix Profile has been recently proposed as a promising technique to perform time series analysis. For each subsequence, the matrix profile provides the most similar neighbour in the time series. This computation requires a huge amount of floating-point (FP) operations, which are a major contributor (approximately 50%) to the energy consumption in modern computing platforms. Transprecision Computing has recently emerged as a promising approach to improve energy efficiency and performance by tolerating some loss of precision in FP operations. In this work, we study how the matrix profile parallel algorithms benefit from transprecision computing using a recently proposed transprecision FPU. This FPU is intended to be integrated on embedded devices as part of RISC-V processors, FPGAs or ASICs to perform energy-efficient time series analysis. To this end, we propose an accuracy metric to compare the results with the double precision matrix profile. We use this metric to explore a wide range of exponent and mantissa combinations for a variety of datasets, as well as a mixed precision and a vectorized approach. Our analysis reveals that the energy consumption is reduced up to 3.3x compared with double precision approaches, while only slightly affecting the accuracy.","PeriodicalId":202581,"journal":{"name":"2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114743849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

MASA-StarPU: Parallel Sequence Comparison with Multiple Scheduling Policies and Pruning MASA-StarPU:具有多调度策略和修剪的并行序列比较

2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2020-09-01 DOI: 10.1109/SBAC-PAD49847.2020.00039

Rafael A. Lopes, Samuel Thibault, A. Melo

{"title":"MASA-StarPU: Parallel Sequence Comparison with Multiple Scheduling Policies and Pruning","authors":"Rafael A. Lopes, Samuel Thibault, A. Melo","doi":"10.1109/SBAC-PAD49847.2020.00039","DOIUrl":"https://doi.org/10.1109/SBAC-PAD49847.2020.00039","url":null,"abstract":"Sequence comparison tools based on the Smith-Waterman (SW) algorithm provide the optimal result but have high execution times when the sequences compared are long, since a huge dynamic programming (DP) matrix is computed. Block pruning is an optimization that does not compute some parts of the DP matrix and can reduce considerably the execution time when the sequences compared are similar. However, block pruning's resulting task graph is dynamic and irregular. Since different pruning scenarios lead to different pruning shapes, we advocate that no single scheduling policy will behave the best for all scenarios. This paper proposes MASA-StarPU, a sequence aligner that integrates the domain specific framework MASA to the generic programming environment StarPU, creating a tool which has the benefits of StarPU (i.e., multiple task scheduling policies) and MASA (i.e., fast sequence alignment). MASA-StarPU was executed in two different multicore platforms and the results show that a bad choice of the scheduling policy may have a great impact on the performance. For instance, using 24 cores, the 5M x 5M comparison took 1484s with the dmdas policy whereas the same comparison took 3601s with lws. We also show that no scheduling policy behaves the best for all scenarios.","PeriodicalId":202581,"journal":{"name":"2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124482488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2