Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region最新文献

筛选
英文 中文
A Portable Load Balancer for Kubernetes Cluster Kubernetes集群的便携负载均衡器
Kimitoshi Takahashi, K. Aida, Tomoya Tanjo, Jingtao Sun
{"title":"A Portable Load Balancer for Kubernetes Cluster","authors":"Kimitoshi Takahashi, K. Aida, Tomoya Tanjo, Jingtao Sun","doi":"10.1145/3149457.3149473","DOIUrl":"https://doi.org/10.1145/3149457.3149473","url":null,"abstract":"Linux containers have become very popular these days due to their lightweight nature and portability. Numerous web services are now deployed as clusters of containers. Kubernetes is a popular container management system that enables users to deploy such web services easily, and hence, it facilitates web service migration to the other side of the world. However, since Kubernetes relies on external load balancers provided by cloud providers, it is difficult to use in environments where there are no supported load balancers. This is particularly true for on-premise data centers, or for all but the largest cloud providers. In this paper, we proposed a portable load balancer that was usable in any environment, and hence facilitated web services migration. We implemented a containerized software load balancer that is run by Kubernetes as a part of container cluster, using Linux kernel's Internet Protocol Virtual Server(IPVS). Then we compared the performance of our proposed load balancer with existing iptables Destination Network Address Translation (DNAT) and the Nginx load balancers. During our experiments, we also clarified the importance of two network conditions to derive the best performance: the first was the choice of the overlay network operation mode, and the second was distributing packet processing to multiple cores. The results indicated that our proposed IPVS load balancer improved portability of web services without sacrificing the performance.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128918596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Multi-tasking Execution in PGAS Language XcalableMP and Communication Optimization on Many-core Clusters PGAS语言XcalableMP的多任务执行与多核集群的通信优化
Keisuke Tsugane, Jinpil Lee, H. Murai, M. Sato
{"title":"Multi-tasking Execution in PGAS Language XcalableMP and Communication Optimization on Many-core Clusters","authors":"Keisuke Tsugane, Jinpil Lee, H. Murai, M. Sato","doi":"10.1145/3149457.3154482","DOIUrl":"https://doi.org/10.1145/3149457.3154482","url":null,"abstract":"Large-scale clusters based on many-core processors such as Intel Xeon Phi have recently been deployed. Multi-tasking execution using task dependencies in OpenMP 4.0 is a promising candidate for facilitating the parallelization of such many-core processors, because this enables users to avoid global synchronization through fine-grained task-to-task synchronization using user-specified data dependencies. Recently, the partitioned global address space (PGAS) model has emerged as a usable distributed-memory programming model. In this paper, we propose a multi-tasking execution model in the PGAS language XcalableMP (XMP) for many-core clusters. The model provides a method to describe interactions between tasks based on point-to-point communications on the global address space. A communication is executed non-collectively among nodes. We implemented the proposed execution model in XMP, and designed a simple code transformation algorithm to MPI and OpenMP. We implemented two benchmarks using our model for preliminary evaluation, namely blocked Cholesky factorization and the Laplace equation solver. Most of the implementations using our model outperform the conventional barrier-based data-parallel model. To improve the performance in many-core clusters, we propose a communication optimization method by dedicating a single thread for communications, to avoid performance problems related to the current multi-threaded MPI execution. As a result, the performances of blocked Cholesky factorization and the Laplace equation solver using this communication optimization are improved to 138% and 119% compared with the barrier-based implementation in Intel Xeon Phi KNL clusters, respectively. From the viewpoint of productivity, the program implemented by our model in XMP is almost the same as the implementation based on the OpenMP task depend clause, because XMP enables the parallelization of the serial source code with additional directives and small changes as well as OpenMP.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129124387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
TripleID-C: Low Cost Compressed Representation for RDF Query Processing in GPUs TripleID-C: gpu中RDF查询处理的低成本压缩表示
C. Phongpensri, Pisit Makpaisit
{"title":"TripleID-C: Low Cost Compressed Representation for RDF Query Processing in GPUs","authors":"C. Phongpensri, Pisit Makpaisit","doi":"10.1145/3149457.3155322","DOIUrl":"https://doi.org/10.1145/3149457.3155322","url":null,"abstract":"Resource Description Framework (RDF) is a standard format for representing information linkage around the Internet. It uses Internationalized Resources Identifier (IRI) which refers to an external information. Typically, an RDF data is serialized as a large text file which contains millions of relationships. This paper proposes a compact representation for a query processing, called TripleID-C, for large RDF data processing in Graphic Processing Units (GPU). The representation is based on TripleID which is converted from RDF data format. Then TripleID format is converted to TripleID-C which is derived from either compressed rows or compressed column format. TripleID-C is a compressed format whose size is only 5-10% of the traditional NT file, and is about 20-30% of the traditonal TripleID and is about 50-60% of original HDT. We also address how to speedup the conversion process by adjusting data structure usages and using multithreads, where the conversion process can run faster by 30 times compared to the original one.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130800261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Wave Propagation Simulation of Complex Multi-Material Problems with Fast Low-Order Unstructured Finite-Element Meshing and Analysis 基于快速低阶非结构有限元网格的复杂多材料问题波传播模拟与分析
K. Fujita, Keisuke Katsushima, T. Ichimura, Masashi Horikoshi, K. Nakajima, M. Hori, Lalith Maddegedara
{"title":"Wave Propagation Simulation of Complex Multi-Material Problems with Fast Low-Order Unstructured Finite-Element Meshing and Analysis","authors":"K. Fujita, Keisuke Katsushima, T. Ichimura, Masashi Horikoshi, K. Nakajima, M. Hori, Lalith Maddegedara","doi":"10.1145/3149457.3149474","DOIUrl":"https://doi.org/10.1145/3149457.3149474","url":null,"abstract":"Many wave-propagation analyses with varying geometries and material properties are expected to be useful for model optimization. Low-order unstructured finite-element methods are suitable for such analyses, as they are capable of modeling multi-material problems with complex geometries; however, the meshing and analysis cost is large. Therefore, in this paper, we developed a fast mesh-generator and analysis method. The robust mesh generator was 17.4-fold faster than a conventional mesh generator, and the predictor algorithm for dynamic implicit finite-element solvers showed a 1.69-fold increase in speed relative to conventional solvers and a 91.3% size-up efficiency on the full Oakforest-PACS system. We demonstrated the usability of the developed meshing and analysis methods via a wave-propagation simulation on a 1.9 billion unstructured tetrahedral-element model using half of the K computer system (41,472 compute nodes).","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"120 23","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131745515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Towards a Composable Computer System 迈向可组合计算机系统
I. Chung, B. Abali, P. Crumley
{"title":"Towards a Composable Computer System","authors":"I. Chung, B. Abali, P. Crumley","doi":"10.1145/3149457.3149466","DOIUrl":"https://doi.org/10.1145/3149457.3149466","url":null,"abstract":"The recent advancement of technology in both software and hardware enables us to revisit the concept of the composable architecture in the system design. The composable system design provides flexibility to serve a variety of workloads. The system offers a dynamic co-design platform that allows experiments and measurements in a controlled environment. This speeds up the system design and software evolution. It also decouples the lifecycles of components. The design consideration includes adopting available technology with the understanding of application characteristics. With the flexibility, we show the design has the potential to be the infrastructure of both cloud computing and HPC architecture serving a variety of workloads.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128672273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Autotuning MPI Collectives using Performance Guidelines 使用性能指南自动调优MPI集合
S. Hunold, Alexandra Carpen-Amarie
{"title":"Autotuning MPI Collectives using Performance Guidelines","authors":"S. Hunold, Alexandra Carpen-Amarie","doi":"10.1145/3149457.3149461","DOIUrl":"https://doi.org/10.1145/3149457.3149461","url":null,"abstract":"MPI collective operations provide a standardized interface for performing data movements within a group of processes. The efficiency of collective communication operations depends on the actual algorithm, its implementation, and the specific communication problem (type of communication, message size, and number of processes). Many MPI libraries provide numerous algorithms for specific collective operations. The strategy for selecting an efficient algorithm is often times predefined (hard-coded) in MPI libraries, but some of them, such as Open MPI, allow users to change the algorithm manually. Finding the best algorithm for each case is a hard problem, and several approaches to tune these algorithmic parameters have been proposed. We use an orthogonal approach to the parameter-tuning of MPI collectives, that is, instead of testing individual algorithmic choices provided by an MPI library, we compare the latency of a specific MPI collective operation to the latency of semantically equivalent functions, which we call the mock-up implementations. The structure of the mock-up implementations is defined by self-consistent performance guidelines. The advantage of this approach is that tuning using mock-up implementations is always possible, whether or not an MPI library allows users to select a specific algorithm at run-time. We implement this concept in a library called PGMPITuneLib, which is layered between the user code and the actual MPI implementation. This library selects the best-performing algorithmic pattern of an MPI collective by intercepting MPI calls and redirecting them to our mock-up implementations. Experimental results show that PGMPITuneLib can significantly reduce the latency of MPI collectives, and also equally important, that it can help identifying the tuning potential of MPI libraries.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"1396 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120875834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
A Scalable Multi-Granular Data Model for Data Parallel Workflows 面向数据并行工作流的可伸缩多粒度数据模型
Shin'ichiro Takizawa, Motohiko Matsuda, N. Maruyama, Y. Nakamura
{"title":"A Scalable Multi-Granular Data Model for Data Parallel Workflows","authors":"Shin'ichiro Takizawa, Motohiko Matsuda, N. Maruyama, Y. Nakamura","doi":"10.1145/3149457.3154483","DOIUrl":"https://doi.org/10.1145/3149457.3154483","url":null,"abstract":"Scientific applications consist of many tasks and each task has different requirements for the degree of parallelism and data access pattern. To satisfy these requirements, a task scheduling has to assign required number of processes to each task and task's input has to be decomposed and arranged to these processes by considering data access pattern to exploit data locality. However, hand-writing these code is a troublesome and error-prone work. We propose a multi-view data model where users can specify rules of data decomposition for multi-dimensional data to change data layout on top of processes and define unit of parallel processing by simple directives. Our framework conducts data arrangement and affinity-aware task scheduling transparently from users by following the specified rules. Through a case study of a lattice QCD simulation program, we confirmed that our proposal reduced programming efforts against hand-written MPI code with performance penalties up to 17%.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131009307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Study on Open Source Software for Large-Scale Data Visualization on SPARC64fx based HPC Systems 基于SPARC64fx的高性能计算系统大规模数据可视化开源软件研究
J. Nonaka, Motohiko Matsuda, Takashi Shimizu, Naohisa Sakamoto, M. Fujita, K. Onishi, E. C. Inacio, Shun Ito, F. Shoji, K. Ono
{"title":"A Study on Open Source Software for Large-Scale Data Visualization on SPARC64fx based HPC Systems","authors":"J. Nonaka, Motohiko Matsuda, Takashi Shimizu, Naohisa Sakamoto, M. Fujita, K. Onishi, E. C. Inacio, Shun Ito, F. Shoji, K. Ono","doi":"10.1145/3149457.3155323","DOIUrl":"https://doi.org/10.1145/3149457.3155323","url":null,"abstract":"In this paper, we present a study on the available open-source software (OSS) for large-scale data visualization on the SPARC64fx based HPC systems, such as the K computer and also the Fujitsu PRIMEHPC FX family of supercomputers (FX10 and FX100), which are commonly available throughout Japan. It is widely known that these HPC systems have been generating a vast amount of simulation results in a wide range of science and engineering fields. However, there was no much information regarding the large-scale data visualization software and approaches in such HPC infrastructure. In this work, we focused on the visualization approaches where the HPC hardware resources are directly used for the visualization processing, which can be helpful to minimize the large data transfer issue for the visualization and analysis purposes. This study includes both OpenGL (Open Graphics Library) and non-OpenGL based visualization approaches, and also the availability of the GLSL (OpenGL Shading Language) handling functionalities. Although it is a short survey focusing only on the post-processing issue, we expect that this study can be useful and helpful for the current and future potential users of the SPARC64fx CPU based HPC systems, which are still in active use throughout Japan.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122696393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Time-space tiling with tile-level parallelism for the 3D FDTD method 三维时域有限差分法中具有块级平行度的时-空平铺
Takeshi Fukaya, T. Iwashita
{"title":"Time-space tiling with tile-level parallelism for the 3D FDTD method","authors":"Takeshi Fukaya, T. Iwashita","doi":"10.1145/3149457.3149478","DOIUrl":"https://doi.org/10.1145/3149457.3149478","url":null,"abstract":"Our aim in this work is to improve the performance of the multi-threaded 3D FDTD solver using time-space tiling techniques that enable tile-level parallelization. The implementation of tile-level parallelization that we have used is based on the so-called diamond tiling technique. In this paper, we present a systematic manner for introducing time-space tiling techniques into the 3D FDTD solver and compare four different approaches. Our performance evaluation on a state-of-the-art multi-core processor demonstrated the effectiveness of the time-space tiling techniques with tile-level parallelism for the 3D FDTD method. For the problem with 2003 grid points, our implementation with two-dimensional tile-level parallelism achieved a speedup of 1.88 times over the naive implementation, while for the problem of 3003 grid points, our implementation with one-dimensional tile-level parallelism showed a speedup of 2.22 times. Both results are better than the speedup obtained from an implementation with intra-tile parallelization presented in a previous work.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117132519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Performing External Join Operator on PostgreSQL with Data Transfer Approach 用数据传输方法在PostgreSQL上实现外部连接运算符
Ryota Takizawa, H. Kawashima, Ryuya Mitsuhashi, O. Tatebe
{"title":"Performing External Join Operator on PostgreSQL with Data Transfer Approach","authors":"Ryota Takizawa, H. Kawashima, Ryuya Mitsuhashi, O. Tatebe","doi":"10.1145/3149457.3149480","DOIUrl":"https://doi.org/10.1145/3149457.3149480","url":null,"abstract":"With the development of sensing devices, the size of data managed by human being has been rapidly increasing. To manage such huge data, relational database management system (RDBMS) plays a key role. RDBMS models the real world data as n-ary relational tables. Join operator is one of the most important relational operators, and its acceleration has been studied widely and deeply. How can an RDBMS provide such an efficient join operator? The performance improvement of join operator has been deeply studied for a decade, and many techniques are proposed already. The problem that we face is how to actually use such excellent techniques in real RDBMSs. We propose to implement an efficient join technique by the data transfer approach. The approach makes a hook point inside an RDBMS internal, and pulls data streams from the operator pipeline in the RDBMS, and applies our original join operator to the data, and finally returns the result to the operator pipeline in the RDBMS. The result of the experiment showed that our proposed method achieved 1.42x speedup compared with PostgreSQL. Our code is available on GitHub.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124006482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信