2017 IEEE High Performance Extreme Computing Conference (HPEC)最新文献_第7页

Scientific computing using consumer video-gaming embedded devices 使用消费者视频游戏嵌入式设备进行科学计算

2017 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2017-09-01 DOI: 10.1109/HPEC.2017.8091055

Glenn Volkema, G. Khanna

引用次数: 1

A scale-free structure for real world networks 现实世界网络的无标度结构

2017 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2017-09-01 DOI: 10.1109/HPEC.2017.8091074

R. Veras, F. Franchetti

{"title":"A scale-free structure for real world networks","authors":"R. Veras, F. Franchetti","doi":"10.1109/HPEC.2017.8091074","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091074","url":null,"abstract":"The field of High Performance Computing (HPC) is defined by application in physics and engineering. These problems drove the development of libraries such as LAPACK, which cast their performance in terms of more specialized building block such as the BLAS. Now that we see a rise in simulation and computational analysis in fields such as biology and the social sciences, how do we leverage existing HPC approaches to these domains. The GraphBLAS project reconciles graph analytics with the machinery of linear algebra libraries. Like their Dense Linear Algebra (DLA) counterpart, the GraphBLAS expresses complex operations in terms of smaller primitives. This paper focuses on efficiently storing real world networks, such that for these graph primitives we can obtain the level of performance seen in DLA. We provide a hierarchical data structured called GERMV, which is an extension of our previous Recursive Matrix Vector (RMV). If the network in question exhibits a scale-free structure, namely hierarchical communities, then our data structure enables high performance. We demonstrate high performance for Sparse Matrix Vector (spMV) and PageRank on real world web graphs.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133115480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Streaming graph challenge: Stochastic block partition 流图挑战:随机块划分

2017 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2017-08-25 DOI: 10.1109/HPEC.2017.8091040

E. Kao, V. Gadepally, M. Hurley, Michael Jones, J. Kepner, S. Mohindra, P. Monticciolo, A. Reuther, S. Samsi, William S. Song, D. Staheli, S. Smith

{"title":"Streaming graph challenge: Stochastic block partition","authors":"E. Kao, V. Gadepally, M. Hurley, Michael Jones, J. Kepner, S. Mohindra, P. Monticciolo, A. Reuther, S. Samsi, William S. Song, D. Staheli, S. Smith","doi":"10.1109/HPEC.2017.8091040","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091040","url":null,"abstract":"An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive benchmarks and challenges have proven to be an effective means to advance state-of-the-art performance and foster community collaboration. This paper describes a graph partition challenge with a baseline partition algorithm of sub-quadratic complexity. The algorithm employs rigorous Bayesian inferential methods based on a statistical model that captures characteristics of the real-world graphs. This strong foundation enables the algorithm to address limitations of well-known graph partition approaches such as modularity maximization. This paper describes various aspects of the challenge including: (1) the data sets and streaming graph generator, (2) the baseline partition algorithm with pseudocode, (3) an argument for the correctness of parallelizing the Bayesian inference, (4) different parallel computation strategies such as node-based parallelism and matrix-based parallelism, (5) evaluation metrics for partition correctness and computational requirements, (6) preliminary timing of a Python-based demonstration code and the open source C++ code, and (7) considerations for partitioning the graph in streaming fashion. Data sets and source code for the algorithm as well as metrics, with detailed documentation are available at GraphChallenge.org.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132327328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 60

Static graph challenge: Subgraph isomorphism 静态图挑战:子图同构

2017 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2017-08-23 DOI: 10.1109/HPEC.2017.8091039

S. Samsi, V. Gadepally, M. Hurley, Michael Jones, E. Kao, S. Mohindra, P. Monticciolo, A. Reuther, S. Smith, William S. Song, D. Staheli, J. Kepner

{"title":"Static graph challenge: Subgraph isomorphism","authors":"S. Samsi, V. Gadepally, M. Hurley, Michael Jones, E. Kao, S. Mohindra, P. Monticciolo, A. Reuther, S. Smith, William S. Song, D. Staheli, J. Kepner","doi":"10.1109/HPEC.2017.8091039","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091039","url":null,"abstract":"The rise of graph analytic systems has created a need for ways to measure and compare the capabilities of these systems. Graph analytics present unique scalability difficulties. The machine learning, high performance computing, and visual analytics communities have wrestled with these difficulties for decades and developed methodologies for creating challenges to move these communities forward. The proposed Subgraph Isomorphism Graph Challenge draws upon prior challenges from machine learning, high performance computing, and visual analytics to create a graph challenge that is reflective of many real-world graph analytics processing systems. The Subgraph Isomorphism Graph Challenge is a holistic specification with multiple integrated kernels that can be run together or independently. Each kernel is well defined mathematically and can be implemented in any programming environment. Subgraph isomorphism is amenable to both vertex-centric implementations and array-based implementations (e.g., using the Graph-BLAS.org standard). The computations are simple enough that performance predictions can be made based on simple computing hardware models. The surrounding kernels provide the context for each kernel that allows rigorous definition of both the input and the output for each kernel. Furthermore, since the proposed graph challenge is scalable in both problem size and hardware, it can be used to measure and quantitatively compare a wide range of present day and future systems. Serial implementations in C++, Python, Python with Pandas, Matlab, Octave, and Julia have been implemented and their single threaded performance have been measured. Specifications, data, and software are publicly available at GraphChallenge.org.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"600 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115108428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 73

Preconditioned spectral clustering for stochastic block partition streaming graph challenge (Preliminary version at arXiv.) 随机块分割流图挑战的预条件谱聚类(初步版本在arXiv.)

2017 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2017-08-21 DOI: 10.1109/HPEC.2017.8091045

David Zhuzhunashvili, A. Knyazev

{"title":"Preconditioned spectral clustering for stochastic block partition streaming graph challenge (Preliminary version at arXiv.)","authors":"David Zhuzhunashvili, A. Knyazev","doi":"10.1109/HPEC.2017.8091045","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091045","url":null,"abstract":"Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) is demonstrated to efficiently solve eigenvalue problems for graph Laplacians that appear in spectral clustering. For static graph partitioning, 10–20 iterations of LOBPCG without preconditioning result in ∼10× error reduction, enough to achieve 100% correctness for all Challenge datasets with known truth partitions, e.g., for graphs with 5K/.1M (50K/1M) Vertices/Edges in 2 (7) seconds, compared to over 5,000 (30,000) seconds needed by the baseline Python code. Our Python code 100% correctly determines 98 (160) clusters from the Challenge static graphs with 0.5M (2M) vertices in 270 (1,700) seconds using 10GB (50GB) of memory. Our single-precision MATLAB code calculates the same clusters at half time and memory. For streaming graph partitioning, LOBPCG is initiated with approximate eigenvectors of the graph Laplacian already computed for the previous graph, in many cases reducing 2–3 times the number of required LOBPCG iterations, compared to the static case. Our spectral clustering is generic, i.e. assuming nothing specific of the block model or streaming, used to generate the graphs for the Challenge, in contrast to the base code. Nevertheless, in 10-stage streaming comparison with the base code for the 5K graph, the quality of our clusters is similar or better starting at stage 4 (7) for emerging edging (snowballing) streaming, while the computations are over 100–1000 faster.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128875451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Distributed triangle counting in the Graphulo matrix math library 分布式三角形计数在Graphulo矩阵数学库

2017 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2017-08-20 DOI: 10.1109/HPEC.2017.8091041

D. Hutchison

引用次数: 8

GraphBLAS C API: Ideas for future versions of the specification GraphBLAS C API:对该规范未来版本的想法

2017 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2017-08-13 DOI: 10.1109/HPEC.2017.8091095

T. Mattson, Carl Yang, Scott McMillan, A. Buluç, J. Moreira

引用次数: 19

D4M 3.0: Extended database and language capabilities D4M 3.0:扩展数据库和语言功能

2017 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2017-08-09 DOI: 10.1109/HPEC.2017.8091083

Lauren Milechin, V. Gadepally, S. Samsi, J. Kepner, Alexander Chen, D. Hutchison

引用次数: 2

Enabling massive deep neural networks with the GraphBLAS 使用GraphBLAS实现大规模深度神经网络

2017 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2017-08-09 DOI: 10.1109/HPEC.2017.8091098

J. Kepner, Manoj Kumar, J. Moreira, P. Pattnaik, M. Serrano, H. Tufo

{"title":"Enabling massive deep neural networks with the GraphBLAS","authors":"J. Kepner, Manoj Kumar, J. Moreira, P. Pattnaik, M. Serrano, H. Tufo","doi":"10.1109/HPEC.2017.8091098","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091098","url":null,"abstract":"Deep Neural Networks (DNNs) have emerged as a core tool for machine learning. The computations performed during DNN training and inference are dominated by operations on the weight matrices describing the DNN. As DNNs incorporate more stages and more nodes per stage, these weight matrices may be required to be sparse because of memory limitations. The GraphBLAS.org math library standard was developed to provide high performance manipulation of sparse weight matrices and input/output vectors. For sufficiently sparse matrices, a sparse matrix library requires significantly less memory than the corresponding dense matrix implementation. This paper provides a brief description of the mathematics underlying the GraphBLAS. In addition, the equations of a typical DNN are rewritten in a form designed to use the GraphBLAS. An implementation of the DNN is given using a preliminary GraphBLAS C library. The performance of the GraphBLAS implementation is measured relative to a standard dense linear algebra library implementation. For various sizes of DNN weight matrices, it is shown that the GraphBLAS sparse implementation outperforms a BLAS dense implementation as the weight matrix becomes sparser.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131294525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

Performance measurements of supercomputing and cloud storage solutions 超级计算和云存储解决方案的性能测量

2017 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2017-08-01 DOI: 10.1109/HPEC.2017.8091073

Michael Jones, J. Kepner, W. Arcand, David Bestor, Bill Bergeron, V. Gadepally, Michael Houle, M. Hubbell, P. Michaleas, Andrew Prout, A. Reuther, S. Samsi, Paul Monticiollo

{"title":"Performance measurements of supercomputing and cloud storage solutions","authors":"Michael Jones, J. Kepner, W. Arcand, David Bestor, Bill Bergeron, V. Gadepally, Michael Houle, M. Hubbell, P. Michaleas, Andrew Prout, A. Reuther, S. Samsi, Paul Monticiollo","doi":"10.1109/HPEC.2017.8091073","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091073","url":null,"abstract":"Increasing amounts of data from varied sources, particularly in the fields of machine learning and graph analytics, are causing storage requirements to grow rapidly. A variety of technologies exist for storing and sharing these data, ranging from parallel file systems used by supercomputers to distributed block storage systems found in clouds. Relatively few comparative measurements exist to inform decisions about which storage systems are best suited for particular tasks. This work provides these measurements for two of the most popular storage technologies: Lustre and Amazon S3. Lustre is an open-source, high performance, parallel file system used by many of the largest supercomputers in the world. Amazon's Simple Storage Service, or S3, is part of the Amazon Web Services offering, and offers a scalable, distributed option to store and retrieve data from anywhere on the Internet. Parallel processing is essential for achieving high performance on modern storage systems. The performance tests used span the gamut of parallel I/O scenarios, ranging from single-client, single-node Amazon S3 and Lustre performance to a large-scale, multi-client test designed to demonstrate the capabilities of a modern storage appliance under heavy load. These results show that, when parallel I/O is used correctly (i.e., many simultaneous read or write processes), full network bandwidth performance is achievable and ranged from 10 gigabits/s over a 10 GigE S3 connection to 0.35 terabits/s using Lustre on a 1200 port 10 GigE switch. These results demonstrate that S3 is well-suited to sharing vast quantities of data over the Internet, while Lustre is well-suited to processing large quantities of data locally.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131361639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3