T. Mattson, David A. Bader, A. Buluç, J. Gilbert, Joseph E. Gonzalez, J. Kepner
{"title":"GABB Introduction","authors":"T. Mattson, David A. Bader, A. Buluç, J. Gilbert, Joseph E. Gonzalez, J. Kepner","doi":"10.1109/IPDPSW.2014.221","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.221","url":null,"abstract":"The Basic Linear Algebra Subprograms (BLAS), introduced over 30 years ago, had a transformative effect on linear algebra. By building Linear Algebra algorithms from a common set of highly optimized building blocks, researchers spend less time mapping algorithms onto specific hardware features and more time on interesting new algorithms. Could the same transformation occur for Graph algorithms? Can Graph algorithm researchers converge around a core set of building blocks so we can focus more on algorithms and less on mapping software onto hardware? Graph Algorithms Building Blocks workshop (GAB'14) will address these questions. The workshop will open with a pair of talks that define a candidate set of graph algorithm building blocks that we call the “Graph BLAS”. With this context established, the reamining talks explore issues raised by these Graph BLAS, suggest alternative sets of low level building blocks, and finally consider lessons learned from past standards efforts. We will close with an interactive panel about our collective quest to standardize a set of core graph algorithm building blocks.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123174164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"New Algorithm for Computing Eigenvectors of the Symmetric Eigenvalue Problem","authors":"A. Haidar, P. Luszczek, J. Dongarra","doi":"10.1109/IPDPSW.2014.130","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.130","url":null,"abstract":"We describe a design and implementation of a multi-stage algorithm for computing eigenvectors of a dense symmetric matrix. We show that reformulating the existing algorithms is beneficial in terms of performance even if that doubles the computational complexity. Through detailed analysis, we show that the effect of the increase in the asymptotic operation count may be compensated by a much improved performance rate. Our performance results indicate that using our approach achieves very good speedup and scalability even when directly compared with the existing state-of-the-art software.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121211129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Genetic Algorithm-Based Sparse Coverage over Urban VANETs","authors":"Huang Cheng, Xin Fei, A. Boukerche, M. Almulla","doi":"10.1109/IPDPSW.2014.59","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.59","url":null,"abstract":"Vehicular ad hoc networks have emerged as a promising area of research in academic fields. However, to design a realistic coverage algorithm for vehicular networks presents a challenge due to the irregularity of the service area, assorted mobility patterns, and resource constraints. In order to resolve these problems, this paper proposes a genetic algorithm-based sparse coverage with statistical analysis, which aims to consider the geometrical attributes of road networks, movement patterns of vehicles and resource limitations. By taking the dimensions of road segments into account, our coverage algorithm provides a buffering operation to suit different types of road topology. By discovering hotspots from the historical trace files, our coverage algorithm can depict the mobility patterns and discover the most valuable regions of a road system. We model this resource-constrained problem as an NP-hard budget coverage problem and resolve it by genetic algorithm. The simulation results verify that our coverage is reliable and scalable for urban vehicular networks.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116648808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HPGC Introduction and Committees","authors":"E. Aubanel, V. Bhavsar, M. Frumkin","doi":"10.1109/IPDPSW.2014.216","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.216","url":null,"abstract":"This eleventh HPGC workshop provides a forum to researchers and engineers to present their results in the areas of cloud and distributed computing. This year we accepted 6 submissions, which are organized into two sessions. This includes one submission from the CloudFlow workshop, which was merged with HPGC this year. We thank Yong Zhao for collaborating with us in the selection of the CloudFlow contribution.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"838 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114059037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Lin, M. Bettencourt, S. Domino, T. Fisher, M. Hoemmen, Jonathan J. Hu, E. Phipps, A. Prokopenko, S. Rajamanickam, C. Siefert, E. Cyr, S. Kennon
{"title":"Towards Extreme-Scale Simulations with Next-Generation Trilinos: A Low Mach Fluid Application Case Study","authors":"P. Lin, M. Bettencourt, S. Domino, T. Fisher, M. Hoemmen, Jonathan J. Hu, E. Phipps, A. Prokopenko, S. Rajamanickam, C. Siefert, E. Cyr, S. Kennon","doi":"10.1109/IPDPSW.2014.166","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.166","url":null,"abstract":"Trilinos is an object-oriented software framework for the solution of large-scale, complex multi-physics engineering and scientific problems. While the original version of Trilinos was designed for highly scalable solutions for large problems, the need for increasingly higher fidelity simulations has pushed the problem sizes beyond what could have been envisioned two decades ago. When problem sizes exceed a billion elements even highly scalable applications and solver stacks require a complete revision. The next-generation Trilinos employs C++ templates in order to solve arbitrarily large problems and enable extreme-scale simulations. We present a case study that involves integration of Trilinos with an engineering application (Sierra low Mach module/Nalu), involving the simulation of low Mach fluid flow for problems of size up to nine billion elements. Through the use of improved algorithms and better software engineering practices, we demonstrate good weak scaling for the matrix assembly and solve for the engineering application for up to a nine billion element fluid flow large eddy simulation (LES) problem on unstructured meshes with a 27 billion row matrix on 131,072 cores of a Cray XE6 platform.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127628754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Empirical Research of Virtual Enterprise Knowledge Transfer's Effectiveness Faced to the Independent Innovation Ability","authors":"Yang Bo, N. Xiong, Wenzhong Guo","doi":"10.1109/IPDPSW.2014.186","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.186","url":null,"abstract":"Based on the theory of knowledge transfer and with the organizational characteristics, the paper makes member enterprises' knowledge transfer behaviors and the innovation of knowledge, technology and management as research objects to study the practical effectiveness of promoting enterprises' independent innovation ability by the successful use of Virtual Enterprise Knowledge Transfer. Having analyzed the Virtual Enterprise Knowledge Transfer's influence to the independent innovation ability of enterprises, the paper constructs a concept mode about Virtual Enterprise Knowledge Transfer 's effectiveness to promote the ability. Meanwhile, it exemplifies the study by using structure equation model and statistical software. The result indicates that the coalition of Virtual Enterprise Knowledge Transfer has a great promotion on the knowledge and technology innovation of member enterprises. Furthermore, the paper is to offer the solution and suggestion during the process of Virtual Enterprise Knowledge Transfer to improve the independent innovation ability of member enterprises.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127808679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Characterizing the Impact of Program Optimizations on Power and Energy for Explicit Hydrodynamics","authors":"E. León, I. Karlin","doi":"10.1109/IPDPSW.2014.89","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.89","url":null,"abstract":"With the end of Denard scaling, future systems will be constrained by power and energy. This will impact application developers by forcing them to restructure and optimize their algorithms in terms of these resources. In this paper, we analyze the impact of different code optimizations on power, energy, and execution time. Our optimizations include loop fusion, data structure transformations, global allocation, and compiler selection. We analyze the static and dynamic components of power and energy as applied to the processor chip and memory domains within a system. In addition, our analysis correlates energy and power changes with performance events and shows that data motion is highly correlated with memory power and energy usage and executed instructions are partially correlated with processor power and energy. Our results demonstrate key tradeoffs among power, energy, and execution time for explicit hydrodynamics via a representative kernel. In particular, we observe that loop fusion and compiler selection improve all objectives, while global allocation and data layout transformations present tradeoffs that are objective-dependent.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"218 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134394787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md. Mohsin Ali, James A. Southern, P. Strazdins, B. Harding
{"title":"Application Level Fault Recovery: Using Fault-Tolerant Open MPI in a PDE Solver","authors":"Md. Mohsin Ali, James A. Southern, P. Strazdins, B. Harding","doi":"10.1109/IPDPSW.2014.132","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.132","url":null,"abstract":"A fault-tolerant version of Open Message Passing Interface (Open MPI), based on the draft User Level Failure Mitigation (ULFM) proposal of the MPI Forum's Fault Tolerance Working Group, is used to create fault-tolerant applications. This allows applications and libraries to design their own recovery methods and control them at the user level. However, only a limited amount of research work on user level failure recovery (including the implementation and performance evaluation of this prototype) has been carried out. This paper contributes a fault-tolerant implementation of an application solving 2D partial differential equations (PDEs) by means of a sparse grid combination technique which is capable of surviving multiple process failures caused by the faults. Our fault recovery involves reconstructing the faulty communicators without shrinking the global size by re-spawning failed MPI processes on the same physical processors where they were before the failure (for load balancing). It also involves restoring lost data from either exact check pointed data on disk, approximated data in memory (via an alternate sparse grid combination technique) or a near-exact copy of replicated data in memory. The experimental results show that the faulty communicator reconstruction time is currently large in the draft ULFM, especially for multiple process failures. They also show that the alternate combination technique has the lowest data recovery overhead, except on a system with very low disk write latency for which checkpointing has the lowest overhead. Furthermore, the errors due to the recovery of approximated data are within a factor of 10 in all cases, with the surprising result that the alternate combination technique being more accurate than the near-exact replication method. The contributed implementation details, including the analysis of the experimental results, of this paper will help application developers to resolve different issues of design and implementation of fault-tolerant applications by means of the Open MPI ULFM standard.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130375847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liang Li, Dixin Tang, Taoying Liu, Hong Liu, Wei Li, Chenzhou Cui
{"title":"Optimizing the Join Operation on Hive to Accelerate Cross-Matching in Astronomy","authors":"Liang Li, Dixin Tang, Taoying Liu, Hong Liu, Wei Li, Chenzhou Cui","doi":"10.1109/IPDPSW.2014.193","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.193","url":null,"abstract":"Cross-matching in astronomy is a basic procedure for comprehensibly analyzing the relations among different celestial objects. The aim is to search celestial objects in different catalogs and to determine if they are the same object. Basically, cross-matching can be expressed as a join query statement. Since celestial catalogs usually contain billion of stars, the join operator must be carefully designed and optimized for efficiency. In this paper, we focus on fulfilling cross-matching by MapReduce based join operators. The challenge is how to optimize the join operators to satisfy specific requirements of cross-matching. Therefore, we propose an optimized method and investigate its efficiency by theoretical analysis and experiment. Our study shows that the method has a remarkable improvement to previous work, especially when the data is very large.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117300013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Karl-Eduard Berger, François Galea, B. L. Cun, Renaud Sirdey
{"title":"Fast Generation of Large Task Network Mappings","authors":"Karl-Eduard Berger, François Galea, B. L. Cun, Renaud Sirdey","doi":"10.1109/IPDPSW.2014.170","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.170","url":null,"abstract":"In the context of networks of massively parallel execution models, optimizing the locality if inter-process communication is a major performance issue. We propose two heuristics to solve a dataflow process network mapping problem, where a network of communicating tasks is placed into a set of processors with limited resource capacities, while minimizing the overall communication bandwidth between processors. Those approaches are designed to tackle instances of over one hundred thousand tasks in acceptable time.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115784745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}