{"title":"Parallel Circuit Simulation on Multi/Many-core Systems","authors":"Xiaoming Chen, Yu Wang, Huazhong Yang","doi":"10.1109/IPDPSW.2012.319","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.319","url":null,"abstract":"SPICE is widely used for transistor-level circuit simulation. However, with the growing complexity of the VLSI at nano-scale, the traditional SPICE simulator has become inefficient to provide accurate verifications. This thesis tries to accelerate transistor-level simulation on multi/many-core systems, and we will solve 3 problems: 1) develop a parallel sparse LU factorization algorithm for circuit simulation, 2) implement the matrix solver on GPU to further accelerate the solver, 3) develop a circuit partitioning based parallel simulation approach on distributed machines to obtain better scalability. The experimental results show that the proposed parallel LU factorization algorithm effectively accelerates the matrix solver for circuit simulation on both CPU and GPU.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123875327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Pérez-Alcaraz, D. Giménez, Alejandro Álvarez Melcón, F. Quesada-Pereira
{"title":"Parallelizing the Computation of Green Functions for Computational Electromagnetism Problems","authors":"C. Pérez-Alcaraz, D. Giménez, Alejandro Álvarez Melcón, F. Quesada-Pereira","doi":"10.1109/IPDPSW.2012.174","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.174","url":null,"abstract":"Green functions are used in various fields to solve non homogeneous integral equations with boundary conditions. In some cases it is necessary to obtain these functions in real time or they are used for big problems with a large execution time which should be reduced. To do so, efficient algorithms for the available computational systems must be developed. With the evolution of technology, the computational systems are now parallel, with multicore laptops or desktops with programmable graphic processing units, and with clusters or supercomputers composed by multicore nodes. In this paper, algorithms for Green functions using different parallelism paradigms are developed and compared. The Green functions used are from computational electromagnetism, and important reductions in the execution time are obtained for typical problems with the implemented algorithms.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125872815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ananta Tiwari, M. Laurenzano, L. Carrington, A. Snavely
{"title":"Modeling Power and Energy Usage of HPC Kernels","authors":"Ananta Tiwari, M. Laurenzano, L. Carrington, A. Snavely","doi":"10.1109/IPDPSW.2012.121","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.121","url":null,"abstract":"Compute intensive kernels make up the majority of execution time in HPC applications. Therefore, many of the power draw and energy consumption traits of HPC applications can be characterized in terms of the power draw and energy consumption of these constituent kernels. Given that power and energy-related constraints have emerged as major design impediments for exascale systems, it is crucial to develop a greater understanding of how kernels behave in terms of power/energy when subjected to different compiler-based optimizations and different hardware settings. In this work, we develop CPU and DIMM power and energy models for three extensively utilized HPC kernels by training artificial neural networks. These networks are trained using empirical data gathered on the target architecture. The models utilize kernel-specific compiler-based optimization parameters and hard-ware tunables as inputs and make predictions for the power draw rate and energy consumption of system components. The resulting power draw and energy usage predictions have an absolute error rate that averages less than 5.5% for three important kernels - matrix multiplication (MM), stencil computation and LU factorization.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129666601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Reconfiguration Algorithm for Three-dimensional VLSI Arrays","authors":"Guiyuan Jiang, W. Jigang, Ji-zhou Sun","doi":"10.1109/IPDPSW.2012.29","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.29","url":null,"abstract":"Reconfigurable VLSI array is a well known fault tolerant architecture for parallel computing, but few reconfiguration approaches are reported so far for three-dimensional (3D) arrays due to the high complexity of reconfiguration. This paper is devoted to develop reconfiguration algorithm for three-dimensional degradable VLSI arrays. Three bypass schemes and three rerouting schemes are proposed to reconfigure a 3D host array with faults resulting in a target sub-array without faults. Moreover, a heuristic algorithm based on plane rerouting is proposed to construct a target sub-array on the selected rows and columns. It is also proved that the reconfiguration problem considered in this paper on the selected rows and columns(MPSRC) can be optimally solvable in linear time. Empirical study shows that the proposed algorithm produces target arrays with good harvest for the case of the fault rate no more than 5%, that is often occurred in real applications.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125058038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Managing Dynamic Reconfiguration for Fault-tolerance on a Manycore Architecture","authors":"Z. Ul-Abdin, Essayas Gebrewahid, B. Svensson","doi":"10.1109/IPDPSW.2012.38","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.38","url":null,"abstract":"With the advent of many core architectures comprising hundreds of processing elements, fault management has become a major challenge. We present an approach that uses the occam-pi language to manage the fault recovery mechanism on a new many core architecture, the Platform 2012 (P2012). The approach is made possible by extending our previously developed compiler framework to compile occam-pi implementations to the P2012 architecture. We describe the techniques used to translate the salient features of the occam-pi language to the native programming model of the P2012 architecture. We demonstrate the applicability of the approach by an experimental case study, in which the DCT algorithm is implemented on a set of four processing elements. During run-time, some of the tasks are then relocated from assumed faulty processing elements to the faultless ones by means of dynamic reconfiguration of the hardware. The working of the demonstrator and the simulation results illustrate not only the feasibility of the approach but also how the use of higher-level abstractions simplifies the fault handling.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"415 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128831321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Portable High-Productivity Approach to Program Heterogeneous Systems","authors":"Z. Bozkus, B. Fraguela","doi":"10.1109/IPDPSW.2012.15","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.15","url":null,"abstract":"The exploitation of heterogeneous resources is becoming increasingly important for general purpose computing. Unfortunately, heterogeneous systems require much more effort to be programmed than the traditional single or even multi-core computers most programmers are familiar with. Not only new concepts, but also new tools with different restrictions must be learned and applied. Additionally, many of these approaches are specific to one vendor or device, resulting in little portability or rapid obsolescence for the applications built on them. Open standards for programming heterogeneous systems such as OpenCL contribute to improve the situation, but the requirement of portability has led to a programming interface more complex than that of other approaches. In this paper we present a novel library-based approach to programming heterogeneous systems that couples portability with ease of use. Our evaluations indicate that while the performance of our library, called Heterogeneous Programming Library (HPL), is on par with that of OpenCL, the current standard for portable heterogeneous computing, the programming effort required by HPL is 3 to 10 times smaller than that of OpenCL based on the authors` implementation of five benchmarks.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115960570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Vuduc, Kenneth Czechowski, Aparna Chandramowlishwaran, JeeWhan Choi
{"title":"Courses in High-performance Computing for Scientists and Engineers","authors":"R. Vuduc, Kenneth Czechowski, Aparna Chandramowlishwaran, JeeWhan Choi","doi":"10.1109/IPDPSW.2012.169","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.169","url":null,"abstract":"This paper reports our experiences in reimplementing an entry-level graduate course in high-performance parallel computing aimed at physical scientists and engineers. These experiences have directly informed a significant redesign of a junior/senior undergraduate course, Introduction to High-Performance Computing (CS 4225 at Georgia Tech), which we are implementing for the current Spring 2012 semester. Based on feedback from the graduate version, the redesign of the undergraduate course emphasizes peer instruction and hands-on activities during the traditional lecture periods, as well as significant time for end-to-end projects. This paper summarizes our anecdotal findings from the graduate version's exit surveys and briefly outlines our plans for the undergraduate course.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"830 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126601733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Takuma Nomizu, D. Takahashi, Jinpil Lee, T. Boku, M. Sato
{"title":"Implementation of XcalableMP Device Acceleration Extention with OpenCL","authors":"Takuma Nomizu, D. Takahashi, Jinpil Lee, T. Boku, M. Sato","doi":"10.1109/IPDPSW.2012.296","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.296","url":null,"abstract":"Due to their outstanding computational performance, many acceleration devices, such as GPUs, the Cell Broadband Engine (Cell/B.E.), and multi-core computing are attracting a lot of attention in the field of high-performance computing. Although there are many programming models and languages de-signed for programming accelerators, such as CUDA, AMD Accelerated Parallel Processing (AMD APP), and OpenCL, these models remain difficult and complex. Furthermore, when programming for accelerator-enhanced clusters, we have to use an inter-node programming interface, such as MPI to coordinate the nodes. In order to address these problems and reduce complexity, an extension to XcalableMP (XMP), a PGAS language, for use on accelerator-enhanced clusters, called XcalableMP Device Acceleration Extension (XMP-dev), is proposed. In XMP-dev, a global distributed data is mapped onto distributed memory of each accelerator, and a fragment of codes can be of-floaded to execute in a set of accelerators. It eliminates the complex programming between nodes and accelerators and between nodes. In this paper, we present an implementation of the XMP-dev runtime library with the OpenCL APIs, while the previous implementation targets CUDA-only. Since OpenCL is a standardized interface supported for various kinds of accelerators, it improves the portability of XMP-dev and reduces the cost of development. In the result of performance evaluation, we show that the OpenCL implementation of XMP-dev can generate portable programs that can run on not only NVIDIA GPU-enhanced clusters but also various accelerator-enhanced clusters.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126488215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Su Chen, Ling Bai, Yi Chen, Hai Jiang, Kuan-Ching Li
{"title":"Deploying Scalable and Secure Secret Sharing with GPU Many-Core Architecture","authors":"Su Chen, Ling Bai, Yi Chen, Hai Jiang, Kuan-Ching Li","doi":"10.1109/IPDPSW.2012.173","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.173","url":null,"abstract":"Secret sharing is an excellent alternative to the traditional cryptographic algorithms due to its unkeyed encryption/decryption and fault tolerance features. Key management hassle faced in most encryption strategies is removed from users and the loss of a certain number of data copies can be tolerated. However, secret sharing schemes have to deal with two contradictory design goals: security and performance. Without keys' involvement, large security margin is expected for the illusion of being computationally secure. In the meantime, such design will degrade the performance of \"encrypting\" and \"decrypting\" secrets. Thus, secret sharing is mainly for small data such as keys and passwords. In order to apply secret sharing to large data sets, this paper redesigned the original schemes to balance the security and performance. With sufficient security margin, Graphics Processing Unit (GPU) is adopted to provide the performance satisfaction. The proposed secret sharing scheme with GPU acceleration is a practical choice for large volume data security. It is particularly good for long-term storage for its unkeyed encryption and fault tolerance. Performance analysis and experimental results have demonstrated the effectiveness and efficiency of the proposed scheme.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127789962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Engineering a New Curriculum: Experiences at Ohio University in Incorporating the IEEE-TCPP Curriculum Initiative During a Transition to Semesters","authors":"D. Juedes, Frank Drews","doi":"10.1109/IPDPSW.2012.167","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.167","url":null,"abstract":"This paper describes the efforts at Ohio University to incorporate selected topics from the IEEE-TCPP Curriculum Initiative into the Computer Science/Computer Engineering curriculum prior to a transition to semesters at Ohio University that will occur in the Fall of 2012. In particular, this paper describes our efforts to incorporate (and evaluate) selected elements of the IEEE-TCPP Curriculum Initiative into three courses in order to best determine the appropriate placement of topics related to parallel and distributed computing in the new CS/CpE curriculum under the semester calendar. In particular, we plan to add or revise existing modules and assignments for CS2 (CS 240B, CS 240C at Ohio University, CS 2401 under semesters), DS/A (CS 361 Data Structures at Ohio University, CS 3610 under semesters), and Systems (CS 442 Operating Systems and Computer Architecture I, CS 4420 under semesters) to help us determine which curricular recommendations belong in those three courses in the new semesters curriculum and which topics are more appropriately placed in new required courses entitled EE 3613 Computer Organization and CS 4000 Introduction to Parallel, Distributed, and Web-Centric Computing or in other existing advanced courses such as CS 4040 Design and Analysis of Algorithms or CS 4100 Formal Languages and Compilers.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129103267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}