2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)最新文献

筛选
英文 中文
AutoPas: Auto-Tuning for Particle Simulations AutoPas:粒子模拟的自动调整
F. Gratl, Steffen Seckler, Nikola Tchipev, H. Bungartz, Philipp Neumann
{"title":"AutoPas: Auto-Tuning for Particle Simulations","authors":"F. Gratl, Steffen Seckler, Nikola Tchipev, H. Bungartz, Philipp Neumann","doi":"10.1109/IPDPSW.2019.00125","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00125","url":null,"abstract":"The C++ library AutoPas aims at delivering optimal node-level performance for particle simulations. This paper describes the internally implemented algorithms, and how the library uses auto-tuning to dynamically select their optimal combination at run-time. Results are presented, which show that all available algorithms and configuration options have their specific advantages. To demonstrate the library's capabilities in relevant application settings, it has been integrated into the software package ls1 mardyn. An example of a realistic molecular dynamics simulation from thermodynamics is shown in which AutoPas detects a change in the best possible algorithm configuration. It adapts the simulation algorithm accordingly, sustaining optimal performance without additional user input.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128616496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Classifying Pedagogical Material to Improve Adoption of Parallel and Distributed Computing Topics 对教学材料进行分类以提高并行和分布式计算主题的采用
Alec Goncharow, Anna Boekelheide, Matthew Mcquaigue, David Burlinson, Erik Saule, K. Subramanian, J. Payton
{"title":"Classifying Pedagogical Material to Improve Adoption of Parallel and Distributed Computing Topics","authors":"Alec Goncharow, Anna Boekelheide, Matthew Mcquaigue, David Burlinson, Erik Saule, K. Subramanian, J. Payton","doi":"10.1109/IPDPSW.2019.00060","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00060","url":null,"abstract":"The NSF/IEEE-TCPP Parallel and Distributed Computing curriculum guidelines released in 2012 (PDC12) is an effort to bring more parallel computing education to early computer science courses. It has been moderately successful, with the inclusion of some PDC topics in the ACM/IEEE Computer Science curriculum guidelines in 2013 (CS13) and some coverage of topics in early CS courses in some universities in the U.S. and around the world. A reason often cited for the lack of a broader adoption is the difficulty for instructors who are not already knowledgable in PDC topics to learn how to teach those topics and align their learning objectives with early CS courses. There have been attempts at bringing textbook chapters, lecture slides, assignments, and demos to the hands of the instructors of early CS classes. However, the effort required to plow through all the available materials and figure out what is relevant to a particular class is daunting. This paper argues that classifying pedagogical materials against the CS13 guidelines and the PDC12 guidelines can provide the means necessary to reduce the burden of adoption for instructors. In this paper, we present CAR-CS, a system that can be used to categorize pedagogical materials according to well-known and established curricular guidelines and show that CAR-CS can be leveraged 1) by PDC experts to identify topics for which pedagogical material does not exist and that should be developed, 2) by instructors of early CS courses to find materials that are similar to the one that they use but that also cover PDC topics, 3) by instructors to check the topics that a course currently covers and those it does not cover.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"44 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131457129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Intelligent Control Navigation Emerging on Multiple Mobile Robots Applying Social Wound Treatment 应用于社会创伤治疗的多移动机器人智能控制导航
Hiram Ponce, Paulo Vitor de Campos Souza
{"title":"Intelligent Control Navigation Emerging on Multiple Mobile Robots Applying Social Wound Treatment","authors":"Hiram Ponce, Paulo Vitor de Campos Souza","doi":"10.1109/IPDPSW.2019.00098","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00098","url":null,"abstract":"In robotics, learning new tasks is a complex solving problem. This learning depends on the environment, the robot configuration, the difficulty of the problem task, even the prior knowledge. Reinforcement learning has been widely employed for learning from scratch and policy search; however, it is very time-consuming. Multi-robots, as collaborative learners, have been proposed to improve the speed of learning in robotics. In this paper, we propose a collaborative intelligent control navigation strategy in robots, including a social wound treatment approach, such that robots can jointly learn how to avoid obstacles and move freely around the environment. This collective learning about social treatment aims to detect unexpected or inefficient behaviors of the robots, allowing them to redirect the right tasks with more agility, as observed in some animals. Experimental results over a multiple homogeneous robot system simulation validated our proposal.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132589578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An Accurate Tool for Modeling, Fingerprinting, Comparison, and Clustering of Parallel Applications Based on Performance Counters 基于性能计数器的并行应用程序建模、指纹、比较和聚类的精确工具
Vitor Ramos, C. Valderrama, S. X. D. Souza, P. Manneback
{"title":"An Accurate Tool for Modeling, Fingerprinting, Comparison, and Clustering of Parallel Applications Based on Performance Counters","authors":"Vitor Ramos, C. Valderrama, S. X. D. Souza, P. Manneback","doi":"10.1109/IPDPSW.2019.00130","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00130","url":null,"abstract":"The analysis of application performance is essential to better exploit its potential on High-Performance Computing (HPC) architectures. Access to performance counters, available in modern processors, allows collecting key information about program behavior to provide the most appropriate HPC execution strategy. In this context, we have developed an accurate tool based on performance counters, which facilitates modeling, fingerprinting, behavior comparison and clustering of applications. It provides a high-level Python API for accessing and configuring performance counters. While the execution and counters gathering is performed by a C++ module to reduce overheads. Moreover, the accuracy of this multiplatform tool was also compared to existing alternatives. Key features, such as performance counters collection, post-processing, and comparison, enable fingerprinting of applications, an important step in understanding program behavior for later classification and optimization according to the parameters characterizing the target HPC platform. For demonstration purposes, the tool was used in the clustering of Polybench applications, a frequently used benchmark set for kernels monitoring. This clustering helped the identification of applications with similar and comparable behaviors in terms of input size, data accesses and movements, resource utilization, and computation, which facilitates the creation of test sets for a given environment, according to specific measurement parameters.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131293248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Linear Solver Framework for Flow and Geomechanics Reservoir Simulation 流体与地质力学油藏模拟的线性求解框架
L. Gasparini, J. Rodrigues, C. Conopoima, D. A. Augusto, Michael Souza, L. M. Carvalho, P. Goldfeld, João Paulo Ramirez, J. Panetta
{"title":"A Linear Solver Framework for Flow and Geomechanics Reservoir Simulation","authors":"L. Gasparini, J. Rodrigues, C. Conopoima, D. A. Augusto, Michael Souza, L. M. Carvalho, P. Goldfeld, João Paulo Ramirez, J. Panetta","doi":"10.1109/IPDPSW.2019.00119","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00119","url":null,"abstract":"This paper describes a parallel solver framework focused on flow and geomechanics reservoir simulation applications. It has been designed to run efficiently on a wide range of target platforms, from desktop workstations to heterogeneous clusters of multicore nodes, with or without GPUs, using a framework for distributed matrices and vectors based on a two-tier hierarchical architecture. Results show good parallel scalability on clusters of multicore nodes. Comparisons with the PETSc library indicate it is competitive with the best available tools. Preliminary tests indicate good speedups and parallel scalability also on multiple GPUs.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131529356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Parallel Processing on FPGA Combining Computation and Communication in OpenCL Programming OpenCL编程中计算与通信相结合的FPGA并行处理
N. Fujita, Ryohei Kobayashi, Y. Yamaguchi, T. Boku
{"title":"Parallel Processing on FPGA Combining Computation and Communication in OpenCL Programming","authors":"N. Fujita, Ryohei Kobayashi, Y. Yamaguchi, T. Boku","doi":"10.1109/IPDPSW.2019.00089","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00089","url":null,"abstract":"In recent years, Field Programmable Gate Array (FPGA) has been a topic of interest in High Performance Computing (HPC) research. Although the biggest problem in utilizing FPGAs for HPC applications is in the difficulty of developing FPGAs, this problem is being solved by High Level Synthesis (HLS). We focus on very high-performance inter-FPGA communication capabilities. The absolute floating-point performance of an FPGA is lower than that of other common accelerators such as GPUs. However, we consider that we can apply FPGAs to a wide variety of HPC applications if we can combine computations and communications on an FPGA. The purpose of this paper is to implement a parallel processing system running applications implemented by HLS combining computations and communications in FPGAs. We propose the Channel over Ethernet (CoE) system that connects multiple FPGAs directly for OpenCL parallel programming. \"Channel\"' is one of the new extensions provided by the Intel OpenCL environment. They are ordinally used for intra-kernel communication inside an FPGA, but we extend them to external communication through the CoE system. In this paper, we introduce two benchmarks as demonstration of the CoE system. We achieved 29.77 Gbps in throughput (approximately 75% of the theoretical peak of 40Gbps) and 950 ns in latency on our system using the pingpong benchmark, which was implemented on Intel Arria10 FPGA. In addition, we evaluated the Himeno benchmark which is a sort of 3D-Computational Fluid Dynamics (CFD) on the system, and we achieved 23689MFLOPS with 4 FPGAs on a problem of size M. We also notice strong scalability, with a 3.93 times speedup compared to a single FPGA run, on the same problem size.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"15 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132974365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
An Appropriate Computing System and Its System Parameters Selection Based on Bottleneck Prediction of Applications 基于应用瓶颈预测的合适计算系统及其系统参数选择
K. Komatsu, Takumi Kishitani, Masayuki Sato, Hiroaki Kobayashi
{"title":"An Appropriate Computing System and Its System Parameters Selection Based on Bottleneck Prediction of Applications","authors":"K. Komatsu, Takumi Kishitani, Masayuki Sato, Hiroaki Kobayashi","doi":"10.1109/IPDPSW.2019.00127","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00127","url":null,"abstract":"Recent computing systems have different characteristics as they consist of various processors such as a scalar processor, an accelerator, and a vector processor. As calculations patterns best suited for individual systems are different and diverged, it is difficult to determine in advance which computing system is appropriate when an application is given. Furthermore, due to the increase in the complexity of computing systems such as the many-core and new memory technologies, it is necessary to adjust many system parameters and find appropriate system parameters for each application to deliver the high performance of a computing system. One of the ways to find a computing system and its system parameter combinations suitable for an HPC application is a bruteforce approach of trial and error executions of an application on various computing systems and various system parameter combinations, which is no longer a realistic way due to its high cost of time and efforts. This paper proposes a method to carefully select a candidate of a computing system and system parameter combinations appropriate for executions of an HPC application by considering both characteristics of the application and computing systems. Thus, the proposed method can narrow down a large search space of computing systems and their system parameter combinations suitable for executing the HPC application.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134303409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Analysis of Energy Efficiency of a Parallel AES Algorithm for CPU-GPU Heterogeneous Platforms 面向CPU-GPU异构平台的并行AES算法能效分析
Xiongwei Fei, Kenli Li, Wangdong Yang, Kuan-Ching Li
{"title":"Analysis of Energy Efficiency of a Parallel AES Algorithm for CPU-GPU Heterogeneous Platforms","authors":"Xiongwei Fei, Kenli Li, Wangdong Yang, Kuan-Ching Li","doi":"10.1109/IPDPSW.2019.00091","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00091","url":null,"abstract":"Encryption plays an important role in protecting data, especially data transferred on the Internet. However, encryption is computationally expensive and this leads to high energy costs. Parallel encryption solutions using more CPU/GPU cores can achieve high performance. If we consider energy efficiency to be cost effective using parallel encryption solutions at the same time, this problem can be alleviated effectively. Because many CPU/GPU cores and encryption are pervasive currently, saving energy cost by parallel encrypting has become an unavoidable problem. In this paper, we propose an energy-efficient parallel Advance Encryption Standard (AES) algorithm for CPU-GPU heterogeneous platforms. These platforms, such as the Green 500 computers, are popular in both high performance and general computing. Parallelizing AES, using both GPUs and CPUs, balances the workload between CPUs and GPUs based on their computing capacities. This approach also uses the Nvidia Management Library (NVML) to adjust GPU frequencies, overlaps data transfers and computation, and fully utilizes GPU computing resources to reduce energy consumption as much as possible. Experiments conducted on a platform with one K20M GPU and two Xeon E5-2640 v2 CPUs show that this approach can reduce energy consumption by 74% compared to CPU-only parallel AES and 21% compared to GPU-only parallel AES on the same platform. Its energy efficiency is 4.66 MB/Joule on average higher than both CPU-only parallel AES (1.15 MB/Joule) and GPU-only parallel AES (3.65 MB/Joule). As an energy-efficient parallel AES solution, it can be used to encrypt data on heterogeneous platforms to save energy, especially for the computers with thousands of heterogeneous nodes.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115640706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Influence of Tasks Duration Variability on Task-Based Runtime Schedulers 任务持续时间可变性对基于任务的运行时调度器的影响
Olivier Beaumont, Lionel Eyraud-Dubois, Yihong Gao
{"title":"Influence of Tasks Duration Variability on Task-Based Runtime Schedulers","authors":"Olivier Beaumont, Lionel Eyraud-Dubois, Yihong Gao","doi":"10.1109/IPDPSW.2019.00013","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00013","url":null,"abstract":"In the context of HPC platforms, individual nodes nowadays consist of heterogenous processing resources such as GPU units and multicores. Those resources share communication and storage resources, inducing complex co-scheduling effects, and making it hard to predict the exact duration of a task or of a communication. To cope with these issues, runtime dynamic schedulers such as starpu have been developed. These systems base their decisions at runtime on the state of the platform and possibly on static priorities of tasks computed offline. In this paper, our goal is to quantify performance variability in the context of HPC heterogeneous nodes, by focusing on very regular dense linear algebra kernels, such as Cholesky and LU factorizations. We therefore first concentrate on the evaluation of the individual block-size kernels variability. Then, we analyze the impact of this variability at the scale of a full application on a dynamic runtime scheduler such as starpu, in order to analyze whether the strategies that have been designed in the context of MapReduce applications to cope with stragglers could be transferred to HPC systems, or if the dynamic nature of runtime schedulers is enough to cope with actual performance variations, even in presence of task dependencies.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121027426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
[Copyright notice] (版权)
{"title":"[Copyright notice]","authors":"","doi":"10.1109/ipdpsw.2019.00003","DOIUrl":"https://doi.org/10.1109/ipdpsw.2019.00003","url":null,"abstract":"","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132405485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信