2020 IEEE High Performance Extreme Computing Conference (HPEC)最新文献_第4页

Leveraging Linear Algebra to Count and Enumerate Simple Subgraphs 利用线性代数来计数和枚举简单子图

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286191

Vitaliy Gleyzer, Andrew J. Soszynski, E. Kao

{"title":"Leveraging Linear Algebra to Count and Enumerate Simple Subgraphs","authors":"Vitaliy Gleyzer, Andrew J. Soszynski, E. Kao","doi":"10.1109/HPEC43674.2020.9286191","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286191","url":null,"abstract":"Even though subgraph counting and subgraph matching are well-known NP-Hard problems, they are foundational building blocks for many scientific and commercial applications. In order to analyze graphs that contain millions to billions of edges, distributed systems can provide computational scalability through search parallelization. One recent approach for exposing graph algorithm parallelization is through a linear algebra formulation and the use of the matrix multiply operation, which conceptually is equivalent to a massively parallel graph traversal. This approach has several benefits, including 1) a mathematically-rigorous foundation, and 2) ability to leverage specialized linear algebra accelerators and high-performance libraries. In this paper we explore and define a linear algebra methodology for performing exact subgraph counting and matching for 4-vertex subgraphs excluding the clique. Matches on these simple subgraphs can be joined as components for a larger subgraph. With thorough analysis we demonstrate that the linear algebra formulation leverages path aggregation which allows it to be up 2x to 5x more efficient in traversing the search space and compressing the results as compared to tree-based subgraph matching techniques.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128024059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

GraphSDH: A General Graph Sampling Framework with Distribution and Hierarchy GraphSDH:一个具有分布和层次的通用图采样框架

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286173

Jingbo Hu, Guohao Dai, Yu Wang, Huazhong Yang

{"title":"GraphSDH: A General Graph Sampling Framework with Distribution and Hierarchy","authors":"Jingbo Hu, Guohao Dai, Yu Wang, Huazhong Yang","doi":"10.1109/HPEC43674.2020.9286173","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286173","url":null,"abstract":"Large-scale graphs play a vital role in various applications, but it is limited by the long processing time. Graph sampling is an effective way to reduce the amount of graph data and accelerate the algorithm. However, previous work usually lacks theoretical analysis related to graph algorithm models. In this study, GraphSDH (Graph Sampling with Distribution and Hierarchy), a general large-scale graph sampling framework is established based on the vertex-centric graph model. According to four common sampling techniques, we derive the sampling probability to minimize the variance, and optimize the design according to whether there is a pre-estimation process for the intermediate value. In order to further improve the accuracy of the graph algorithm, we propose a stratified sampling method based on vertex degree and a hierarchical optimization scheme based on sampling position analysis. Extensive experiments on large graphs show that GraphSDH can achieve over 95% accuracy for PageRank by sampling only 10% edges of the original graph, and speed up PageRank by several times than that of the non-sampling case. Compared with random neighbor sampling, GraphSDH can reduce the mean relative error of PageRank by about 17% at a sampling neighbor ratio (sampling fraction) of 20%. Furthermore, GraphSDH can be applied to various graph algorithms, such as Breadth-First Search (BFS), Alternating Least Squares (ALS) and Label Propagation Algorithm (LPA).","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125821806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Feasibility Study for MPI over HDFS 基于HDFS的MPI的可行性研究

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286250

Wu-chun Feng, Da Zhang, Jing Zhang, Kaixi Hou, S. Pumma, Hao Wang

{"title":"A Feasibility Study for MPI over HDFS","authors":"Wu-chun Feng, Da Zhang, Jing Zhang, Kaixi Hou, S. Pumma, Hao Wang","doi":"10.1109/HPEC43674.2020.9286250","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286250","url":null,"abstract":"With the increasing prominence of integrating highperformance computing (HPC) with big-data (BIGDATA) processing, running MPI over the Hadoop Distributed File System (HDFS) offers a promising approach for delivering better scalability and fault tolerance to traditional HPC applications. However, it comes with challenges that discourage such an approach: (1) two-sided MPI communication to support intermediate data processing, (2) a focus on enabling N-1 writes that is subject to the default HDFS block-placement policy, and (3) a pipelined writing mode in HDFS that cannot fully utilize the underlying HPC hardware. So, while directly integrating MPI with HDFS may deliver better scalability and fault tolerance to MPI applications, it will fall short of delivering competitive performance. Consequently, we present a performance study to evaluate the feasibility of integrating MPI applications to run over HDFS. Specifically, we show that by aggregating and reordering intermediate data and coordinating computation and 110 when running MPI over HDFS, we can deliver up to 1.92x and 1.78x speedup over MPI I/O and HDFS pipelined-write implementations, respectively. Consequently, we present a performance study to evaluate the feasibility of integrating MPI applications to run over HDFS. Specifically, we show that by aggregating and reordering intermediate data and coordinating computation and 110 when running MPI over HDFS, we can deliver up to 1.92x and 1.78x speedup over MPI I/O and HDFS pipelined-write implementations, respectively.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124315658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Minesweeper: A Novel and Fast Ordered-Statistic CFAR Algorithm 扫雷:一种新的快速有序统计CFAR算法

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286140

Carl L. Colena, Michael J. Russell, Stephen A. Braun

引用次数: 0

LessMine: Reducing Sample Space and Data Access for Dense Pattern Mining 减少密集模式挖掘的样本空间和数据访问

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286187

Tianyu Fu, Ziqian Wan, Guohao Dai, Yu Wang, Huazhong Yang

引用次数: 1

Automatic Mapping and Optimization to Kokkos with Polyhedral Compilation 用多面体编译自动映射和优化Kokkos

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286233

M. Baskaran, Charles Jin, Benoît Meister, J. Springer

{"title":"Automatic Mapping and Optimization to Kokkos with Polyhedral Compilation","authors":"M. Baskaran, Charles Jin, Benoît Meister, J. Springer","doi":"10.1109/HPEC43674.2020.9286233","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286233","url":null,"abstract":"In the post-Moore's Law era, the quest for exascale computing has resulted in diverse hardware architecture trends, including novel custom and/or specialized processors to accelerate the systems, asynchronous or self-timed computing cores, and near-memory computing architectures. To contend with such heterogeneous and complex hardware targets, there have been advanced software solutions in the form of new programming models and runtimes. However, using these advanced programming models poses productivity and performance portability challenges. This work takes a significant step towards addressing the performance, productivity, and performance portability challenges faced by the high-performance computing and exascale community. We present an automatic mapping and optimization framework that takes sequential code and automatically generates high-performance parallel code in Kokkos, a performance portable parallel programming model targeted for exascale computing. We demonstrate the productivity and performance benefits of optimized mapping to Kokkos using kernels from a critical application project on climate modeling, the Energy Exascale Earth System Model (E3SM) project. This work thus shows that automatic generation of Kokkos code enhances the productivity of application developers and enables them to fully utilize the benefits of a programming model such as Kokkos.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131754124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FPGAs in the Network and Novel Communicator Support Accelerate MPI Collectives 网络中的fpga和新型通信器支持加速MPI集体

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286200

Pouya Haghi, Anqi Guo, Qingqing Xiong, Rushi Patel, Chen Yang, Tong Geng, Justin T. Broaddus, Ryan J. Marshall, A. Skjellum, M. Herbordt

{"title":"FPGAs in the Network and Novel Communicator Support Accelerate MPI Collectives","authors":"Pouya Haghi, Anqi Guo, Qingqing Xiong, Rushi Patel, Chen Yang, Tong Geng, Justin T. Broaddus, Ryan J. Marshall, A. Skjellum, M. Herbordt","doi":"10.1109/HPEC43674.2020.9286200","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286200","url":null,"abstract":"MPI collective operations can often be performance killers in HPC applications; we seek to solve this bottleneck by offloading them to reconfigurable hardware within the switch itself, rather than, e.g., the NIC. We have designed a hardware accelerator MPI-FPGA to implement six MPI collectives in the network. Preliminary results show that MPI-FPGA achieves on average 3.9× speedup over conventional clusters in the most likely scenarios. Essential to this work is providing support for sub-communicator collectives. We introduce a novel mechanism that enables the hardware to support a large number of communicators of arbitrary shape, and that is scalable to very large systems. We show how communicator support can be integrated easily into an in-switch hardware accelerator to implement MPI communicators and so enable full offload of MPI collectives. While this mechanism is universally applicable, we implement it in an FPGA cluster; FPGAs provide the ability to couple communication and computation and so are an ideal testbed and have a number of other architectural benefits. MPI-FPGA is fully integrated into MPICH and so transparently usable by MPI annlications.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128314797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

A Hardware Root-of-Trust Design for Low-Power SoC Edge Devices 低功耗SoC边缘器件的硬件信任根设计

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286164

Alan Ehret, Eliakin Del Rosario, K. Gettings, M. Kinsy

{"title":"A Hardware Root-of-Trust Design for Low-Power SoC Edge Devices","authors":"Alan Ehret, Eliakin Del Rosario, K. Gettings, M. Kinsy","doi":"10.1109/HPEC43674.2020.9286164","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286164","url":null,"abstract":"In this work, we introduce a hardware root-of-trust architecture for low-power edge devices. An accelerator-based SoC design that includes the hardware root-of-trust architecture is developed. An example application for the device is presented. We examine attacks based on physical access given the significant threat they pose to unattended edge systems. The hardware root-of-trust provides security features to ensure the integrity of the SoC execution environment when deployed in uncontrolled, unattended locations. E-fused boot memory ensures the boot code and other security critical software is not compromised after deployment. Digitally signed programmable instruction memory prevents execution of code from untrusted sources. A programmable finite state machine is used to enforce access policies to device resources even if the application software on the device is compromised. Access policies isolate the execution states of application and security-critical software. The hardware root-of-trust architecture saves energy with a lower hardware overhead than a separate secure enclave while eliminating software attack surfaces for access control policies.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114766042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

KTRussExPLORER: Exploring the Design Space of K-truss Decomposition Optimizations on GPUs KTRussExPLORER:探索gpu上k -桁架分解优化的设计空间

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286165

Safaa Diab, Mhd Ghaith Olabi, I. E. Hajj

引用次数: 8

A GraphBLAS solution to the SIGMOD 2014 Programming Contest using multi-source BFS 使用多源BFS的SIGMOD 2014编程竞赛的GraphBLAS解决方案

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286186

Márton Elekes, A. Nagy, Dávid Sándor, János Benjamin Antal, Tim Davis, Gábor Szárnyas

{"title":"A GraphBLAS solution to the SIGMOD 2014 Programming Contest using multi-source BFS","authors":"Márton Elekes, A. Nagy, Dávid Sándor, János Benjamin Antal, Tim Davis, Gábor Szárnyas","doi":"10.1109/HPEC43674.2020.9286186","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286186","url":null,"abstract":"The GraphBLAS standard defines a set of fundamental building blocks for formulating graph algorithms in the language of linear algebra. Since its first release in 2017, the expressivity of the GraphBLAS API and the performance of its implementations (such as SuiteSparse: GraphBLAS) have been studied on a number of textbook graph algorithms such as BFS, single-source shortest path, and connected components. However, less attention was devoted to other aspects of graph processing such as handling typed and attributed graphs (also known as property graphs), and making use of complex graph query techniques (handling paths, aggregation, and filtering). To study these problems in more detail, we have used GraphBLAS to solve the case study of the 2014 SIGMOD Programming Contest, which defines complex graph processing tasks that require a diverse set of operations. Our solution makes heavy use of multi-source BFS algorithms expressed as sparse matrix-matrix multiplications along with other GraphBLAS techniques such as masking and submatrix extraction. While the queries can be formulated in GraphBLAS concisely, our performance evaluation shows mixed results. For some queries and data sets, the performance is competitive with the hand-optimized top solutions submitted to the contest, however, in some cases, it is currently outperformed by orders of magnitude.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134201920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1