Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region最新文献_第9页

A Portable Load Balancer for Kubernetes Cluster Kubernetes集群的便携负载均衡器

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2018-01-28 DOI: 10.1145/3149457.3149473

Kimitoshi Takahashi, K. Aida, Tomoya Tanjo, Jingtao Sun

{"title":"A Portable Load Balancer for Kubernetes Cluster","authors":"Kimitoshi Takahashi, K. Aida, Tomoya Tanjo, Jingtao Sun","doi":"10.1145/3149457.3149473","DOIUrl":"https://doi.org/10.1145/3149457.3149473","url":null,"abstract":"Linux containers have become very popular these days due to their lightweight nature and portability. Numerous web services are now deployed as clusters of containers. Kubernetes is a popular container management system that enables users to deploy such web services easily, and hence, it facilitates web service migration to the other side of the world. However, since Kubernetes relies on external load balancers provided by cloud providers, it is difficult to use in environments where there are no supported load balancers. This is particularly true for on-premise data centers, or for all but the largest cloud providers. In this paper, we proposed a portable load balancer that was usable in any environment, and hence facilitated web services migration. We implemented a containerized software load balancer that is run by Kubernetes as a part of container cluster, using Linux kernel's Internet Protocol Virtual Server(IPVS). Then we compared the performance of our proposed load balancer with existing iptables Destination Network Address Translation (DNAT) and the Nginx load balancers. During our experiments, we also clarified the importance of two network conditions to derive the best performance: the first was the choice of the overlay network operation mode, and the second was distributing packet processing to multiple cores. The results indicated that our proposed IPVS load balancer improved portability of web services without sacrificing the performance.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128918596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Multi-tasking Execution in PGAS Language XcalableMP and Communication Optimization on Many-core Clusters PGAS语言XcalableMP的多任务执行与多核集群的通信优化

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2018-01-28 DOI: 10.1145/3149457.3154482

Keisuke Tsugane, Jinpil Lee, H. Murai, M. Sato

{"title":"Multi-tasking Execution in PGAS Language XcalableMP and Communication Optimization on Many-core Clusters","authors":"Keisuke Tsugane, Jinpil Lee, H. Murai, M. Sato","doi":"10.1145/3149457.3154482","DOIUrl":"https://doi.org/10.1145/3149457.3154482","url":null,"abstract":"Large-scale clusters based on many-core processors such as Intel Xeon Phi have recently been deployed. Multi-tasking execution using task dependencies in OpenMP 4.0 is a promising candidate for facilitating the parallelization of such many-core processors, because this enables users to avoid global synchronization through fine-grained task-to-task synchronization using user-specified data dependencies. Recently, the partitioned global address space (PGAS) model has emerged as a usable distributed-memory programming model. In this paper, we propose a multi-tasking execution model in the PGAS language XcalableMP (XMP) for many-core clusters. The model provides a method to describe interactions between tasks based on point-to-point communications on the global address space. A communication is executed non-collectively among nodes. We implemented the proposed execution model in XMP, and designed a simple code transformation algorithm to MPI and OpenMP. We implemented two benchmarks using our model for preliminary evaluation, namely blocked Cholesky factorization and the Laplace equation solver. Most of the implementations using our model outperform the conventional barrier-based data-parallel model. To improve the performance in many-core clusters, we propose a communication optimization method by dedicating a single thread for communications, to avoid performance problems related to the current multi-threaded MPI execution. As a result, the performances of blocked Cholesky factorization and the Laplace equation solver using this communication optimization are improved to 138% and 119% compared with the barrier-based implementation in Intel Xeon Phi KNL clusters, respectively. From the viewpoint of productivity, the program implemented by our model in XMP is almost the same as the implementation based on the OpenMP task depend clause, because XMP enables the parallelization of the serial source code with additional directives and small changes as well as OpenMP.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129124387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

TripleID-C: Low Cost Compressed Representation for RDF Query Processing in GPUs TripleID-C: gpu中RDF查询处理的低成本压缩表示

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2018-01-28 DOI: 10.1145/3149457.3155322

C. Phongpensri, Pisit Makpaisit

引用次数: 3

Wave Propagation Simulation of Complex Multi-Material Problems with Fast Low-Order Unstructured Finite-Element Meshing and Analysis 基于快速低阶非结构有限元网格的复杂多材料问题波传播模拟与分析

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2018-01-28 DOI: 10.1145/3149457.3149474

K. Fujita, Keisuke Katsushima, T. Ichimura, Masashi Horikoshi, K. Nakajima, M. Hori, Lalith Maddegedara

引用次数: 8

Towards a Composable Computer System 迈向可组合计算机系统

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2018-01-28 DOI: 10.1145/3149457.3149466

I. Chung, B. Abali, P. Crumley

引用次数: 21

Autotuning MPI Collectives using Performance Guidelines 使用性能指南自动调优MPI集合

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2018-01-28 DOI: 10.1145/3149457.3149461

S. Hunold, Alexandra Carpen-Amarie

{"title":"Autotuning MPI Collectives using Performance Guidelines","authors":"S. Hunold, Alexandra Carpen-Amarie","doi":"10.1145/3149457.3149461","DOIUrl":"https://doi.org/10.1145/3149457.3149461","url":null,"abstract":"MPI collective operations provide a standardized interface for performing data movements within a group of processes. The efficiency of collective communication operations depends on the actual algorithm, its implementation, and the specific communication problem (type of communication, message size, and number of processes). Many MPI libraries provide numerous algorithms for specific collective operations. The strategy for selecting an efficient algorithm is often times predefined (hard-coded) in MPI libraries, but some of them, such as Open MPI, allow users to change the algorithm manually. Finding the best algorithm for each case is a hard problem, and several approaches to tune these algorithmic parameters have been proposed. We use an orthogonal approach to the parameter-tuning of MPI collectives, that is, instead of testing individual algorithmic choices provided by an MPI library, we compare the latency of a specific MPI collective operation to the latency of semantically equivalent functions, which we call the mock-up implementations. The structure of the mock-up implementations is defined by self-consistent performance guidelines. The advantage of this approach is that tuning using mock-up implementations is always possible, whether or not an MPI library allows users to select a specific algorithm at run-time. We implement this concept in a library called PGMPITuneLib, which is layered between the user code and the actual MPI implementation. This library selects the best-performing algorithmic pattern of an MPI collective by intercepting MPI calls and redirecting them to our mock-up implementations. Experimental results show that PGMPITuneLib can significantly reduce the latency of MPI collectives, and also equally important, that it can help identifying the tuning potential of MPI libraries.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"1396 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120875834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

A Scalable Multi-Granular Data Model for Data Parallel Workflows 面向数据并行工作流的可伸缩多粒度数据模型

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2018-01-28 DOI: 10.1145/3149457.3154483

Shin'ichiro Takizawa, Motohiko Matsuda, N. Maruyama, Y. Nakamura

引用次数: 1

A Study on Open Source Software for Large-Scale Data Visualization on SPARC64fx based HPC Systems 基于SPARC64fx的高性能计算系统大规模数据可视化开源软件研究

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2018-01-28 DOI: 10.1145/3149457.3155323

J. Nonaka, Motohiko Matsuda, Takashi Shimizu, Naohisa Sakamoto, M. Fujita, K. Onishi, E. C. Inacio, Shun Ito, F. Shoji, K. Ono

{"title":"A Study on Open Source Software for Large-Scale Data Visualization on SPARC64fx based HPC Systems","authors":"J. Nonaka, Motohiko Matsuda, Takashi Shimizu, Naohisa Sakamoto, M. Fujita, K. Onishi, E. C. Inacio, Shun Ito, F. Shoji, K. Ono","doi":"10.1145/3149457.3155323","DOIUrl":"https://doi.org/10.1145/3149457.3155323","url":null,"abstract":"In this paper, we present a study on the available open-source software (OSS) for large-scale data visualization on the SPARC64fx based HPC systems, such as the K computer and also the Fujitsu PRIMEHPC FX family of supercomputers (FX10 and FX100), which are commonly available throughout Japan. It is widely known that these HPC systems have been generating a vast amount of simulation results in a wide range of science and engineering fields. However, there was no much information regarding the large-scale data visualization software and approaches in such HPC infrastructure. In this work, we focused on the visualization approaches where the HPC hardware resources are directly used for the visualization processing, which can be helpful to minimize the large data transfer issue for the visualization and analysis purposes. This study includes both OpenGL (Open Graphics Library) and non-OpenGL based visualization approaches, and also the availability of the GLSL (OpenGL Shading Language) handling functionalities. Although it is a short survey focusing only on the post-processing issue, we expect that this study can be useful and helpful for the current and future potential users of the SPARC64fx CPU based HPC systems, which are still in active use throughout Japan.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122696393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Time-space tiling with tile-level parallelism for the 3D FDTD method 三维时域有限差分法中具有块级平行度的时-空平铺

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2018-01-28 DOI: 10.1145/3149457.3149478

Takeshi Fukaya, T. Iwashita

引用次数: 7

Performing External Join Operator on PostgreSQL with Data Transfer Approach 用数据传输方法在PostgreSQL上实现外部连接运算符

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2018-01-28 DOI: 10.1145/3149457.3149480

Ryota Takizawa, H. Kawashima, Ryuya Mitsuhashi, O. Tatebe

{"title":"Performing External Join Operator on PostgreSQL with Data Transfer Approach","authors":"Ryota Takizawa, H. Kawashima, Ryuya Mitsuhashi, O. Tatebe","doi":"10.1145/3149457.3149480","DOIUrl":"https://doi.org/10.1145/3149457.3149480","url":null,"abstract":"With the development of sensing devices, the size of data managed by human being has been rapidly increasing. To manage such huge data, relational database management system (RDBMS) plays a key role. RDBMS models the real world data as n-ary relational tables. Join operator is one of the most important relational operators, and its acceleration has been studied widely and deeply. How can an RDBMS provide such an efficient join operator? The performance improvement of join operator has been deeply studied for a decade, and many techniques are proposed already. The problem that we face is how to actually use such excellent techniques in real RDBMSs. We propose to implement an efficient join technique by the data transfer approach. The approach makes a hook point inside an RDBMS internal, and pulls data streams from the operator pipeline in the RDBMS, and applies our original join operator to the data, and finally returns the result to the operator pipeline in the RDBMS. The result of the experiment showed that our proposed method achieved 1.42x speedup compared with PostgreSQL. Our code is available on GitHub.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124006482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0