2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)最新文献

筛选
英文 中文
Case Study: Using Project Based Learning to Develop Parallel Programing and Soft Skills 案例研究:使用基于项目的学习来开发并行编程和软技能
Awad A. Younis, Rajshekhar Sunderraman, M. Metzler, A. Bourgeois
{"title":"Case Study: Using Project Based Learning to Develop Parallel Programing and Soft Skills","authors":"Awad A. Younis, Rajshekhar Sunderraman, M. Metzler, A. Bourgeois","doi":"10.1109/IPDPSW.2019.00059","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00059","url":null,"abstract":"In today's environment, where every computer, including cell phones, are multicore, it is essential that students develop parallel programming skills. It remains a challenge to develop effective techniques for teaching parallel programing skills. Another challenge is finding time within already packed lectures to cover additional material. To that end, we investigate the effectiveness of using Project Based Learning (PBL) to teach parallel programming skills early in the curriculum by developing and incorporating a PBL module into CSc 3210 (Computer Organization and Programming). This is a core course taken by all computer science majors and is a prerequisite to many of our senior-level classes. In our case study, 124 students are organized into 26 diverse groups, with four or five students per group, and assigned five project assignments, each of two-weeks duration. Given a Raspberry PI, students will explore its multicore architecture and create programs for shared memory parallelism using OpenMP and C language. Our results show that incorporating this PBL module has a significant and direct effect on the student's growth in parallel programming skills. As a side benefit, we also show that there is a direct improvement on a student's personal growth in terms soft skills, which is essential in the professional development and success in the workplace. By having students experience PBL in an early class, close to the midpoint of the academic program, it can serve as a mini-capstone project. Furthermore, students can collaboratively learn by themselves (through teamwork) and apply the fundamentals of parallel programming skills without the need for separate lectures, labs, or workshops.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114703670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A Distributed Wheel Sieve Algorithm 一种分布式轮筛算法
G. Paillard, F. França, C. Lavault
{"title":"A Distributed Wheel Sieve Algorithm","authors":"G. Paillard, F. França, C. Lavault","doi":"10.1109/IPDPSW.2019.00107","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00107","url":null,"abstract":"This paper presents a new distributed approach for generating all prime numbers in a given interval of integers. From Eratosthenes, who elaborated the first prime sieve (more than 2000 years ago), to the current generation of parallel computers, which have permitted to reach larger bounds on the interval or to obtain previous results in a shorter time, prime numbers generation still represents an attractive domain of research and plays a central role in cryptography. We propose a fully distributed algorithm for finding all primes in the interval [2; n], based on the wheel sieve and the SMER (Scheduling by Multiple Edge Reversal) multigraph dynamics which runs in O(√(n)) computational complexity, close to the theoretical lower bound on sieve methods, that is O(n), without making use of preprocessing techniques.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115813512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Evaluation of the MODYLAS Application on Modern Multi-core and Many-Core Environments 现代多核和多核环境下modlas应用的性能评价
S. Ohshima, Soichiro Suzuki, Tatsuya Sakashita, M. Ogino, T. Katagiri, Yoshimichi Andoh
{"title":"Performance Evaluation of the MODYLAS Application on Modern Multi-core and Many-Core Environments","authors":"S. Ohshima, Soichiro Suzuki, Tatsuya Sakashita, M. Ogino, T. Katagiri, Yoshimichi Andoh","doi":"10.1109/IPDPSW.2019.00129","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00129","url":null,"abstract":"Molecular dynamics (MD) simulations are an essential tool for various fields of science. To broaden the range of applications of MD simulations, faster and more large-scale simulations that achieve the same accuracy are required. Authors have developed an MD application named MODYLAS over several years. In our previous work, we focused on the pairwise additive calculation of potentials and forces, which is one of the hot spots in MD simulations. We have proposed new thread-level algorithms and evaluated the performance using the FX100 supercomputer system. In this study, we measure the performance of the above algorithms on Skylake-SP and Knights Landing processors and compare the results with FX100. Using the obtained results, we discuss the application performance on this hardware and the potential performance improvement by using the auto-tuning technique.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120954360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Activity Based Approach for Teaching Parallel Computing: An Indian Experience 基于活动的并行计算教学方法:印度的经验
P. Chitra, S. Ghafoor
{"title":"Activity Based Approach for Teaching Parallel Computing: An Indian Experience","authors":"P. Chitra, S. Ghafoor","doi":"10.1109/IPDPSW.2019.00057","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00057","url":null,"abstract":"Due to the rapid growth in the multicore and GPU based computing devices, the need to teach parallel computing in CS/CE curriculum has become almost mandatory nowadays. A course on Parallel Computing Systems (PCS) has been designed to provide an understanding of the fundamental principles and engineering trade-offs involved in designing modern parallel computing systems as well as to teach parallel programming techniques necessary to effectively utilize these machines. An activity based learning approach was adopted for teaching the course and several parallel programming paradigms and technologies such OpenMP, MPI, and CUDA have been covered. This course was offered as a required course to graduate students. This paper describes the implementation of the course at Thiagarajar College of Engineering. Evaluation of the implementation of the course reveals that for students who have not been exposed to parallel and distributed computing, i) activity based learning results in better knowledge gain compared to the traditional approach, ii) learning OpenMP was much easier than MPI or CUDA, iii) some Parallel and Distributed Computing (PDC) concepts such as false sharing were harder to grasp compared to basic concepts, and iv) it is essential to introduce parallel computing in the undergraduate curriculum.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116642425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
AHEAD: A Tool for Projecting Next-Generation Hardware Enhancements on GPU-Accelerated Systems 未来:预测下一代gpu加速系统硬件增强的工具
Hazem A. Abdelhafez, Christopher Zimmer, Sudharshan S. Vazhkudai, M. Ripeanu
{"title":"AHEAD: A Tool for Projecting Next-Generation Hardware Enhancements on GPU-Accelerated Systems","authors":"Hazem A. Abdelhafez, Christopher Zimmer, Sudharshan S. Vazhkudai, M. Ripeanu","doi":"10.1109/IPDPSW.2019.00103","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00103","url":null,"abstract":"Starting with the Titan supercomputer (at the Oak Ridge Leadership Computing Facility, OLCF) in 2012, top supercomputers have Increasingly leveraged the performance of GPUs to support large-scale computational science. The current No. 1 machine, the 200 petaflop Summit system at OLCF, is a GPU-based machine. Accelerator-based architectures, however, add additional complexity due to node heterogeneity. To inform procurement decisions, supercomputing centers need the tools to quickly model the impact of changes of the node architectures on application performance. We present AHEAD, a profiling and modeling tool to quantify the impact of intra-node communication mechanism (e.g., PCI or NVLink) on application performance. Our experiments show average weighted relative errors of ~19% and ~23% for five CORAL-2 (a collaboration between multiple US Department of Energy, DOE, labs to procure Exascale systems) and 12 Rodinia benchmarks respectively, without running the applications on the target future node.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121961158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
VirtP4: An Architecture for P4 Virtualization VirtP4: P4虚拟化的架构
Mateus Saquetti, Guilherme Bueno, Weverton Cordeiro, J. Azambuja
{"title":"VirtP4: An Architecture for P4 Virtualization","authors":"Mateus Saquetti, Guilherme Bueno, Weverton Cordeiro, J. Azambuja","doi":"10.1109/IPDPSW.2019.00021","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00021","url":null,"abstract":"This paper presents VirtP4, an architecture for the virtualization of P4-based programmable forwarding planes. VirtP4 provides parallel execution of true independent virtual switch instances with assistance of traffic control and packet routing. The architecture is implemented in a NetFPGA-SUME board running two virtual switches, a L2 Switch and a Router. The area occupation data show the possibility of implementing up to 13 P4 instances in parallel. When compared to related works, performance results exhibit improvements up to 3 orders of magnitude for bandwidth and 2 orders of magnitude for latency.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123392146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Constraint Embedding for Solving Optimization Problems on Quantum Annealers 求解量子退火炉优化问题的约束嵌入
Tomás Vyskocil, H. Djidjev
{"title":"Constraint Embedding for Solving Optimization Problems on Quantum Annealers","authors":"Tomás Vyskocil, H. Djidjev","doi":"10.1109/IPDPSW.2019.00109","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00109","url":null,"abstract":"Quantum annealers such as the commercially available D-Wave machines are designed to natively solve quadratic unconstrained binary optimization (QUBO) problems. While most of the well-known NP-hard optimization problems can easily be formulated as quadratic binary problems, such formulations also contain constraints, which commonly are added to the objective function in the form of penalties to obtain a QUBO version. However, the standard method for defining such penalties leads to QUBOs that are dense and therefore take too much of the resources of the quantum annealer. In this paper, we describe an alternative approach to the constraint embedding problem that uses mixed-integer linear programming (MILP) and is scalable to problems of arbitrary number of variables.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132747968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Simulation Framework for Studying Optical Cable Failures in Dragonfly Topologies 蜻蜓拓扑下研究光缆故障的仿真框架
Tiffany Connors, Taylor L. Groves, Tony Quan, K. Hemmert
{"title":"Simulation Framework for Studying Optical Cable Failures in Dragonfly Topologies","authors":"Tiffany Connors, Taylor L. Groves, Tony Quan, K. Hemmert","doi":"10.1109/IPDPSW.2019.00141","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00141","url":null,"abstract":"In high performance computing (HPC) systems, optical network links are often utilized for the HPC networks of these systems, but they have a relatively high rate of failure compared to their electrical counterparts. Due to the high link failure rate, evaluating the impact of these failures on HPC workloads is of particular interest. We extended the Merlin network module of the Structural Simulation Toolkit (SST) in order to evaluate the impact of link failures on applications running on HPC systems which use dragonfly network topologies.We focus on dragonfly topologies as they are frequently found in HPC systems, including NERSC Cori and Edison systems.We demonstrate our changes to SST by providing a sample of performance results and routing statistics for a dragonfly network of 8,192 nodes and three HPC workloads with 1% of optical link failures. For the three motifs under consideration, we show that the impact of link failure is largely dependent on the underlying workloads running on the system.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127121301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A GP Hyper-Heuristic Approach for Generating TSP Heuristics 一种生成TSP启发式的GP超启发式方法
Gabriel Duflo, Emmanuel Kieffer, Matthias R. Brust, Grégoire Danoy, P. Bouvry
{"title":"A GP Hyper-Heuristic Approach for Generating TSP Heuristics","authors":"Gabriel Duflo, Emmanuel Kieffer, Matthias R. Brust, Grégoire Danoy, P. Bouvry","doi":"10.1109/IPDPSW.2019.00094","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00094","url":null,"abstract":"A wide range of heuristics has been developed over the last decades as a way to obtain good quality solutions in reasonable time on large scale optimisation problems. However, heuristics are problem specific, i.e. lack of generalisation potential, while requiring time to design. Hyper-heuristics have been proposed to address these limitations by directly searching in the heuristics' space. This work more precisely focuses on a heuristic generation method, as opposed to heuristic selection, for the travelling salesman problem (TSP). Learning is achieved with a genetic programming (GP) approach, for which novel specific terminals are introduced. The performance of the proposed GP hyper-heuristic is evaluated on a large set of TSP instances and compared to state-of-the-art heuristics. Experiments demonstrate that the generated heuristics are outperforming existing ones while having similar or lower complexity.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130529715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
A Case Study for an Accelerated DCNN on FPGA-Based Embedded Distributed System 基于fpga的嵌入式分布式系统加速DCNN实例研究
Anna Maria Nestorov, Alberto Scolari, Enrico Reggiani, Luca Stornaiuolo, M. Santambrogio
{"title":"A Case Study for an Accelerated DCNN on FPGA-Based Embedded Distributed System","authors":"Anna Maria Nestorov, Alberto Scolari, Enrico Reggiani, Luca Stornaiuolo, M. Santambrogio","doi":"10.1109/IPDPSW.2019.00025","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00025","url":null,"abstract":"Face Detection (FD) recently became the base of multiple applications requiring low latency but also with limited resources and energy budgets. Deep Convolutional Neural Networks (DCNNs) are especially accurate in FD, but latency requirements and energy budgets call for Field Programmable Gate Arrays (FPGAs)-based solutions, trading flexibility and efficiency. Nonetheless, the offer of FPGAs solutions is limited and different chips often require expensive re-design phases, while developers desire solutions whose resources can scale proportionally to the demands. Therefore, this work presents an FD solution based on a DCNN on a distributed, embedded system with FPGAs, proposing a general approach to reduce the DCNN size and to design its FPGA cores and investigating its accuracy, performance, and energy efficiency.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126672222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信