2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)最新文献

筛选
英文 中文
Exposing data locality in HPC-based systems by using the HDFS backend 通过使用HDFS后端,在基于hpc的系统中暴露数据局部性
José Rivadeneira, Félix García Carballeira, J. Carretero, Francisco Javier García Blas
{"title":"Exposing data locality in HPC-based systems by using the HDFS backend","authors":"José Rivadeneira, Félix García Carballeira, J. Carretero, Francisco Javier García Blas","doi":"10.1109/HiPC50609.2020.00038","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00038","url":null,"abstract":"Nowadays, there are two main approaches for dealing with data-intensive applications: parallel file systems in classical High-Performance Computing (HPC) centers and Big Data like parallel file system for ensuring the data centric vision. Furthermore, there is a growing overlap between HPC and Big Data applications, given that Big Data paradigm is a growing consumer of HPC resources. HDFS is one of the most important file systems for data intensive applications while, from the parallel file systems point of view, MPI-IO is the most used interface for parallel I/O. In this paper, we propose a novel solution for taking advantage of HDFS through MPI-based parallel applications. To demonstrate its feasibility, we have included our approach in MIMIR, a MapReduce framework for MPI-based applications. We have optimized MIMIR framework by providing data locality features provided by our approach. The experimental evaluation demonstrates that our solution offers around 25% performance for map phase compared with the MIMIR baseline solution.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133645128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fair Allocation of Asymmetric Operations in Storage Systems 存储系统中非对称操作的公平分配
Thomas Keller, P. Varman
{"title":"Fair Allocation of Asymmetric Operations in Storage Systems","authors":"Thomas Keller, P. Varman","doi":"10.1109/HiPC50609.2020.00030","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00030","url":null,"abstract":"Managing the trade-off between efficiency and fairness in a storage system is challenging due to high variability in workload behavior. Most workloads are made up of a mix of asymmetric operations (e.g. read/write, sequential/random, or striped/isolated I/Os) in different proportions, which places different resource demands on the storage device. The problem is to allocate device resources to the heterogeneous workloads fairly while maintaining high device throughput. In this paper, we present a new model for fair allocation of heterogeneous workloads with different ratios of asymmetric operations. We propose an adaptive scheme that chooses between two policies-the traditional Time-Balanced Allocation (TBA) and our proposed Bottleneck-Balanced Allocation (BBA)-based on workload characteristics. The fairness and throughput of these allocation policies are established through formal analysis. Our algorithms are tested with an adaptive, dynamic scheduler implemented in a simulation testbed, and the results validate the performance benefits of our approach.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129346112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SimGQ: Simultaneously Evaluating Iterative Graph Queries SimGQ:同时评估迭代图查询
Chengshuo Xu, Abbas Mazloumi, Xiaolin Jiang, Rajiv Gupta
{"title":"SimGQ: Simultaneously Evaluating Iterative Graph Queries","authors":"Chengshuo Xu, Abbas Mazloumi, Xiaolin Jiang, Rajiv Gupta","doi":"10.1109/HiPC50609.2020.00014","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00014","url":null,"abstract":"Graph processing frameworks are typically designed to optimize the evaluation of a single graph query. However, in practice, we often need to respond to multiple graph queries, either from different users or from a single user performing a complex analytics task. Therefore in this paper we develop SimGQ, a system that optimizes simultaneous evaluation of a group of vertex queries that originate at different source vertices (e.g., multiple shortest path queries originating at different source vertices) and delivers substantial speedups over a conventional framework that evaluates and responds to queries one by one. The performance benefits are achieved via batching and sharing. Batching fully utilizes system resources to evaluate a batch of queries and amortizes runtime overheads incurred due to fetching vertices and edge lists, synchronizing threads, and maintaining computation frontiers. Sharing dynamically identifies shared queries that substantially represent subcomputations in the evaluation of different queries in a batch, evaluates the shared queries, and then uses their results to accelerate the evaluation of all queries in the batch. With four input power-law graphs and four graph algorithms SimGQ achieves speedups of up to 45.67 × with batch sizes of up to 512 queries over the baseline implementation that evaluates the queries one by one using the state of the art Ligra system. Moreover, both batching and sharing contribute substantially to the speedups.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115342422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
[Title page] (标题页)
{"title":"[Title page]","authors":"","doi":"10.1109/hipc50609.2020.00001","DOIUrl":"https://doi.org/10.1109/hipc50609.2020.00001","url":null,"abstract":"","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128891764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Content-defined Merkle Trees for Efficient Container Delivery 内容定义的默克尔树,用于高效的集装箱交付
Yuta Nakamura, Raza Ahmad, T. Malik
{"title":"Content-defined Merkle Trees for Efficient Container Delivery","authors":"Yuta Nakamura, Raza Ahmad, T. Malik","doi":"10.1109/HiPC50609.2020.00026","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00026","url":null,"abstract":"Containerization simplifies the sharing and deployment of applications when environments change in the software delivery chain. To deploy an application, container delivery methods push and pull container images. These methods operate on file and layer (set of files) granularity, and introduce redundant data within a container. Several container operations such as upgrading, installing, and maintaining become inefficient, because of copying and provisioning of redundant data. In this paper, we reestablish recent results that block-level deduplication reduces the size of individual containers, by verifying the result using content-defined chunking. Block-level deduplication, however, does not improve the efficiency of push/pull operations which must determine the specific blocks to transfer. We introduce a content-defined Merkle Tree (CDMT) over deduplicated storage in a container. CDMT indexes deduplicated blocks and determines changes to blocks in logarithmic time on the client. CDMT efficiently pushes and pulls container images from a registry, especially as containers are upgraded and (re-)provisioned on a client. We also describe how a registry can efficiently maintain the CDMT index as new image versions are pushed. We show the scalability of CDMT over Merkle Trees in terms of disk and network I/O savings using 15 container images and 233 image versions from Docker Hub.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122257709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
HiPC 2020 ORGANIZATION hipc2020组织
{"title":"HiPC 2020 ORGANIZATION","authors":"","doi":"10.1109/hipc50609.2020.00007","DOIUrl":"https://doi.org/10.1109/hipc50609.2020.00007","url":null,"abstract":"","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"35 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114108943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HyPR: Hybrid Page Ranking on Evolving Graphs HyPR:基于进化图的混合页面排名
Hemant Kumar Giri, Mridul Haque, D. Banerjee
{"title":"HyPR: Hybrid Page Ranking on Evolving Graphs","authors":"Hemant Kumar Giri, Mridul Haque, D. Banerjee","doi":"10.1109/HiPC50609.2020.00020","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00020","url":null,"abstract":"PageRank (PR) is the standard metric used by the Google search engine to compute the importance of a web page via modeling the entire web as a first order Markov chain. The challenge of computing PR efficiently and quickly has been already addressed by several works previously who have shown innovations in both algorithms and in the use of parallel computing. The standard method of computing PR is handled by modelling the web as a graph. The fast growing internet adds several new web pages everyday and hence more nodes (representing the web pages) and edges (the hyperlinks) are added to this graph in an incremental fashion. Computing PR on this evolving graph is now an emerging challenge since computations from scratch on the massive graph is time consuming and unscalable. In this work, we propose Hybrid Page Rank (HyPR), which computes PR on evolving graphs using collaborative executions on muti-core CPUs and massively parallel GPUs. We exploit data parallelism via efficiently partitioning the graph into different regions that are affected and unaffected by the new updates. The different partitions are then processed in an overlapped manner for PR updates. The novelty of our technique is in utilizing the hybrid platform to scale the solution to massive graphs. The technique also provides high performance through parallel processing of every batch of updates using a parallel algorithm. HyPR efficiently executes on a NVIDIA V100 GPU hosted on a 6th Gen Intel Xeon CPU and is able to update a graph with 640M edges with a single batch of 100,000 edges in 12 ms. HyPR outperforms other state of the art techniques for computing PR on evolving graphs [1] by 4.8x. Additionally HyPR provides 1.2x speedup over GPU only executions, and 95x speedup over CPU only parallel executions.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132546601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating Force-directed Graph Layout with Processing-in-Memory Architecture 用内存处理架构加速力导向图形布局
Ruihao Li, Shuang Song, Qinzhe Wu, L. John
{"title":"Accelerating Force-directed Graph Layout with Processing-in-Memory Architecture","authors":"Ruihao Li, Shuang Song, Qinzhe Wu, L. John","doi":"10.1109/HiPC50609.2020.00041","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00041","url":null,"abstract":"In the big data domain, the visualization of graph systems provides users more intuitive experiences, especially in the field of social networks, transportation systems, and even medical and biological domains. Processing-in-Memory (PIM) has been a popular choice for deploying emerging applications as a result of its high parallelism and low energy consumption. Furthermore, memory cells of PIM platforms can serve as both compute units and storage units, making PIM solutions able to efficiently support visualizing graphs at different scales. In this paper, we focus on using the PIM platform to accelerate the Force-directed Graph Layout (FdGL) algorithm, which is one of the most fundamental algorithms in the field of visualization. We fully explore the parallelism inside the FdGL algorithm and integrate an algorithm level optimization strategy into our PIM system. In addition, we use programmable instruction sets to achieve more flexibility in our PIM system. Our PIM architecture can achieve 8.07× speedup compared with a GPU platform of the same peak throughput. Compared with state-of-the-art CPU and GPU platforms, our PIM system can achieve an average of 13.33× and 2.14× performance speedup with 74.51× and 14.30× energy consumption reduction on six real world graphs.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131022316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Batched Small Tensor-Matrix Multiplications on GPUs gpu上的批量小张量矩阵乘法
Keke Zhai, Tania Banerjee-Mishra, A. Wijayasiri, S. Ranka
{"title":"Batched Small Tensor-Matrix Multiplications on GPUs","authors":"Keke Zhai, Tania Banerjee-Mishra, A. Wijayasiri, S. Ranka","doi":"10.1109/HiPC50609.2020.00044","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00044","url":null,"abstract":"We present a fine-tuned library, ZTMM, for batched small tensor-matrix multiplication on GPU architectures. Libraries performing optimized matrix-matrix multiplications involving large matrices are available for many architectures, including a GPU. However, these libraries do not provide optimal performance for applications requiring efficient multiplication of a matrix with a batch of small matrices or tensors. There has been recent interest in developing fine-tuned libraries for batched small matrix-matrix multiplication - these efforts are limited to square matrices. ZTMM supports both square and rectangular matrices. We experimentally demonstrate that our library has significantly higher performance than cuBLAS and Magma libraries. We demonstrate our library's use on a spectral element-based solver called CMT-nek that performs high-fidelity predictive simulations using compressible Navier-Stokes equations. CMT-nek involves three-dimensional tensors, but it is possible to apply the same techniques to higher dimensional tensors.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123108525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Temporal Based Intelligent LRU Cache Construction 基于时序的智能LRU缓存构建
Pavan Nittur, Anuradha Kanukotla, Narendra Mutyala
{"title":"Temporal Based Intelligent LRU Cache Construction","authors":"Pavan Nittur, Anuradha Kanukotla, Narendra Mutyala","doi":"10.1109/HiPC50609.2020.00045","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00045","url":null,"abstract":"In the Android platform, the cache-slots store applications upon their launch, which it later uses for prefetching. The Least Recently Used (LRU) based caching algorithm which governs these cache-slots can fail to maintain essential applications in the slot, especially in scenarios like memory-crunch, temporal-burst or volatile environment situations. The construction of these cache-slots can be ameliorated by selectively storing user critical applications before their launch. This reform would require a successful forecast of the user-app-launch pattern using intelligent machine learning agents without hindering the smooth execution of parallel processes. In this paper, we propose a sophisticated Temporal based Intelligent Process Management (TIPM) system, which learns to predict a Smart Application List (SAL) based on the usage pattern. Using SAL, we construct Intelligent LRU cache-slots, that retains essential user applications in the memory and provide improved launch rates. Our experimental results from testing TIPM with different users demonstrate significant improvement in cache-hit rate (95%) and yielding a gain of 26% to the current baseline (LRU), thereby making it a valuable enhancement to the platform.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133627930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信