2018 IEEE 25th International Conference on High Performance Computing (HiPC)最新文献_第4页

Adaptive Pattern Matching with Reinforcement Learning for Dynamic Graphs 基于强化学习的动态图自适应模式匹配

2018 IEEE 25th International Conference on High Performance Computing (HiPC) Pub Date : 2018-12-01 DOI: 10.1109/HiPC.2018.00019

H. Kanezashi, T. Suzumura, D. García-Gasulla, Min-hwan Oh, S. Matsuoka

{"title":"Adaptive Pattern Matching with Reinforcement Learning for Dynamic Graphs","authors":"H. Kanezashi, T. Suzumura, D. García-Gasulla, Min-hwan Oh, S. Matsuoka","doi":"10.1109/HiPC.2018.00019","DOIUrl":"https://doi.org/10.1109/HiPC.2018.00019","url":null,"abstract":"Graph pattern matching algorithms to handle million-scale dynamic graphs are widely used in many applications such as social network analytics and suspicious transaction detections from financial networks. On the other hand, the computation complexity of many graph pattern matching algorithms is expensive, and it is not affordable to extract patterns from million-scale graphs. Moreover, most real-world networks are time-evolving, updating their structures continuously, which makes it harder to update and output newly matched patterns in real time. Many incremental graph pattern matching algorithms which reduce the number of updates have been proposed to handle such dynamic graphs. However, it is still challenging to recompute vertices in the incremental graph pattern matching algorithms in a single process, and that prevents the real-time analysis. We propose an incremental graph pattern matching algorithm to deal with time-evolving graph data and also propose an adaptive optimization system based on reinforcement learning to recompute vertices in the incremental process more efficiently. Then we discuss the qualitative efficiency of our system with several types of data graphs and pattern graphs. We evaluate the performance using million-scale attributed and time-evolving social graphs. Our incremental algorithm is up to 10.1 times faster than an existing graph pattern matching and 1.95 times faster with the adaptive systems in a computation node than naive incremental processing.","PeriodicalId":113335,"journal":{"name":"2018 IEEE 25th International Conference on High Performance Computing (HiPC)","volume":"166 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122189147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

DeepHyper: Asynchronous Hyperparameter Search for Deep Neural Networks deepyper:深度神经网络的异步超参数搜索

2018 IEEE 25th International Conference on High Performance Computing (HiPC) Pub Date : 2018-12-01 DOI: 10.1109/HiPC.2018.00014

Prasanna Balaprakash, Michael A. Salim, T. Uram, V. Vishwanath, Stefan M. Wild

{"title":"DeepHyper: Asynchronous Hyperparameter Search for Deep Neural Networks","authors":"Prasanna Balaprakash, Michael A. Salim, T. Uram, V. Vishwanath, Stefan M. Wild","doi":"10.1109/HiPC.2018.00014","DOIUrl":"https://doi.org/10.1109/HiPC.2018.00014","url":null,"abstract":"Hyperparameters employed by deep learning (DL) methods play a substantial role in the performance and reliability of these methods in practice. Unfortunately, finding performance optimizing hyperparameter settings is a notoriously difficult task. Hyperparameter search methods typically have limited production-strength implementations or do not target scalability within a highly parallel machine, portability across different machines, experimental comparison between different methods, and tighter integration with workflow systems. In this paper, we present DeepHyper, a Python package that provides a common interface for the implementation and study of scalable hyperparameter search methods. It adopts the Balsam workflow system to hide the complexities of running large numbers of hyperparameter configurations in parallel on high-performance computing (HPC) systems. We implement and study asynchronous model-based search methods that consist of sampling a small number of input hyperparameter configurations and progressively fitting surrogate models over the input-output space until exhausting a user-defined budget of evaluations. We evaluate the efficacy of these methods relative to approaches such as random search, genetic algorithms, Bayesian optimization, and hyperband on DL benchmarks on CPU-and GPU-based HPC systems.","PeriodicalId":113335,"journal":{"name":"2018 IEEE 25th International Conference on High Performance Computing (HiPC)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128092830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 90

Compiling SIMT Programs on Multi- and Many-Core Processors with Wide Vector Units: A Case Study with CUDA 在多核和多核宽矢量单元处理器上编译SIMT程序:CUDA的案例研究

2018 IEEE 25th International Conference on High Performance Computing (HiPC) Pub Date : 2018-12-01 DOI: 10.1109/HiPC.2018.00022

Hancheng Wu, J. Ravi, M. Becchi

{"title":"Compiling SIMT Programs on Multi- and Many-Core Processors with Wide Vector Units: A Case Study with CUDA","authors":"Hancheng Wu, J. Ravi, M. Becchi","doi":"10.1109/HiPC.2018.00022","DOIUrl":"https://doi.org/10.1109/HiPC.2018.00022","url":null,"abstract":"Manycore processors and coprocessors with wide vector extensions, such as Intel Phi and Skylake devices, have become popular due to their high throughput capability. Performance optimization on these devices requires using both their x86-compatible cores and their vector units. While the x86-compatible cores can be programmed using traditional programming interfaces following the MIMD model, such as POSIX threads, MPI and OpenMP, the SIMD vector units are harder to program. The Intel software stack provides two approaches for code vectorization: automatic vectorization through the Intel compiler and manual vectorization through vector intrinsics. While the Intel compiler often fails to vectorize code with complex control flows and function calls, the manual approach is error-prone and leads to less portable code. Hence, there has been an increasing interest in SIMT programming tools allowing the simultaneous use of x86 cores and vector units while providing programmability and code portability. However, the effective implementation of the SIMT model on these hybrid architectures is not well understood. In this work, we target this problem. First, we propose a set of compiler techniques to transform programs written using a SIMT programming model (a subset of CUDA C) into code that leverages both the x86 cores and the vector units of a hybrid MIMD/SIMD architecture, thus providing programmability, high system utilization and performance. Second, we evaluate the proposed techniques on Xeon Phi and Skylake processors using micro-benchmarks and real-world applications. Third, we compare the resulting performance with that achieved by the same code on GPUs. Based on this analysis, we point out the main challenges in supporting the SIMT model on hybrid MIMD/SIMD architectures, while providing performance comparable to that of SIMT systems (e.g., GPUs).","PeriodicalId":113335,"journal":{"name":"2018 IEEE 25th International Conference on High Performance Computing (HiPC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134254061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Expediting Parallel Graph Connectivity Algorithms 加速并行图连接算法

2018 IEEE 25th International Conference on High Performance Computing (HiPC) Pub Date : 2018-12-01 DOI: 10.1109/HiPC.2018.00017

Kishore Kothapalli, Mihir Wadwekar

{"title":"Expediting Parallel Graph Connectivity Algorithms","authors":"Kishore Kothapalli, Mihir Wadwekar","doi":"10.1109/HiPC.2018.00017","DOIUrl":"https://doi.org/10.1109/HiPC.2018.00017","url":null,"abstract":"Finding whether a graph is k-connected, and the identification of its k-connected components is a fundamental problem in graph theory. For this reason, there have been several algorithms for this problem in both the sequential and parallel settings. Several recent sequential and parallel algorithms for k-connectivity rely on one or more breadth-first traversals of the input graph. While BFS can be made very efficient in a sequential setting, the same cannot be said in the case of parallel environments. A major factor in this difficulty is due to the inherent requirement to use a shared queue, balance work among multiple threads in every round, synchronization, and the like. Optimizing the execution of BFS on many current parallel architectures is therefore quite challenging. For this reason, it can be noticed that the time spent by the current parallel graph connectivity algorithms on BFS operations is usually a significant portion of their overall runtime. In this paper, we study how one can, in the context of algorithms for graph connectivity, mitigate the practical inefficiency of relying on BFS operations in parallel. Our technique suggests that such algorithms may not require a BFS of the input graph but actually can work with a sparse spanning subgraph of the input graph. The incorrectness introduced by not using a BFS spanning tree can then be offset by further post-processing steps on suitably defined small auxiliary graphs. Our experiments on finding the 2, and 3-connectivity of graphs on Nvidia K40c GPUs improve the state-of-the-art on the corresponding problems by a factor 2.2x, and 2.1x respectively.","PeriodicalId":113335,"journal":{"name":"2018 IEEE 25th International Conference on High Performance Computing (HiPC)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131876315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Shared-Memory Parallel Maximal Clique Enumeration 共享内存并行最大团枚举

2018 IEEE 25th International Conference on High Performance Computing (HiPC) Pub Date : 2018-07-25 DOI: 10.1109/HiPC.2018.00016

A. Das, Seyed-Vahid Sanei-Mehri, S. Tirthapura

引用次数: 17

Parallel Nonnegative CP Decomposition of Dense Tensors 密集张量的并行非负CP分解

2018 IEEE 25th International Conference on High Performance Computing (HiPC) Pub Date : 2018-06-19 DOI: 10.1109/HiPC.2018.00012

Grey Ballard, Koby Hayashi, R. Kannan

引用次数: 22

Why do Users Kill HPC Jobs? 为什么用户会杀死HPC作业?

2018 IEEE 25th International Conference on High Performance Computing (HiPC) Pub Date : 2018-05-03 DOI: 10.1109/HiPC.2018.00039

Venkatesh Prasad Ranganath, Daniel Andresen

引用次数: 0

The Future of Supercomputing 超级计算的未来

2018 IEEE 25th International Conference on High Performance Computing (HiPC) Pub Date : 2014-06-10 DOI: 10.1145/2597652.2616585

M. Snir

{"title":"The Future of Supercomputing","authors":"M. Snir","doi":"10.1145/2597652.2616585","DOIUrl":"https://doi.org/10.1145/2597652.2616585","url":null,"abstract":"For over two decades, supercomputing evolved in a relatively straightforward manner: Supercomputers were assembled out of commodity microprocessors and leveraged their exponential increase in performance, due to Moore's Law. This simple model has been under stress since clock speed stopped growing a decade ago: Increased performance has required a commensurate increase in the number of concurrent threads. The evolution of device technology is likely to be even less favorable in the coming decade: The growth in CMOS performance is nearing its end, and no alternative technology is ready to replace CMOS. The continued shrinking of device size requires increasingly expensive technologies, and may not lead to improvements in cost/performance ratio; at which point, it ceases to make sense for commodity technology. These obstacles need not imply stagnation in supercomputer performance. In the long run, new computing models will come to the rescue. In the short run, more exotic, non-commodity device technologies can provide two or more orders of magnitude improvements in performance. Finally, better hardware and software architectures can significantly increase the efficiency of scientific computing platforms. While continued progress is possible, it will require a significant international research effort and major investments in future large-scale \"computational instruments\".","PeriodicalId":113335,"journal":{"name":"2018 IEEE 25th International Conference on High Performance Computing (HiPC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128046371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8