2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)最新文献

筛选
英文 中文
A Locality-aware Cooperative Distributed Memory Caching for Parallel Data Analytic Applications 面向并行数据分析应用的位置感知协同分布式内存缓存
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00183
Chia–Ting Hung, J. Chou, Ming-Hung Chen, I. Chung
{"title":"A Locality-aware Cooperative Distributed Memory Caching for Parallel Data Analytic Applications","authors":"Chia–Ting Hung, J. Chou, Ming-Hung Chen, I. Chung","doi":"10.1109/IPDPSW55747.2022.00183","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00183","url":null,"abstract":"Memory caching has long been used to fill up the performance gap between processor and disk for reducing the data access time of data-intensive computations. Previous studies on caching mostly focus on optimizing the hit rate of a single machine. But in this paper, we argue that the caching decision of a distributed memory system should be performed in a cooperative manner for the parallel data analytic applications, which are commonly used by emerging technologies, such as Big Data and AI (Artificial Intelligence), to perform data mining and sophisticated analytics on larger data volume in a shorter time. A parallel data analytic job consists of multiple parallel tasks. Hence, the completion time of a job is bounded by its slowest task, meaning that the job cannot benefit from caching until all inputs of its tasks are cached. To address the problem, we proposed a cooperative caching design that periodically rearranges the cache placement among nodes according to the data access pattern while taking the task dependency and network locality into account. Our approach is evaluated by a trace-driven simulator using both synthetic workload and real-world traces. The results show that we can reduce the average completion times up to 33% compared to a non-collaborative caching polices and 25% compared to other start-of-the-art collaborative caching policies.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121684445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CORtEX 2022 Invited Speaker 4: Large-scale simulations of mammalian brains using peta- to exa-scale computing 2022年皮层特邀演讲者4:使用peta到exa-scale计算大规模模拟哺乳动物大脑
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00216
J. Igarashi
{"title":"CORtEX 2022 Invited Speaker 4: Large-scale simulations of mammalian brains using peta- to exa-scale computing","authors":"J. Igarashi","doi":"10.1109/IPDPSW55747.2022.00216","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00216","url":null,"abstract":"A whole-brain simulation allows us to investigate all interactions among neurons in the brain to understand the mechanisms of information processing and brain diseases. The computational performance of exascale supercomputers in the 2020s is estimated to realize whole-brain simulation at a human scale. However, it has not been realized to sufficiently reproduce and predict neural behaviors and functionality of the whole brain due to the lack of computational resources, physiological and anatomical data, brain models, and neural network simulators. We have studied large-scale brain simulations with various supercomputers toward whole brain simulations. In this talk, we will introduce studies on developing efficient spiking neural simulators, modeling brain disease, and large-scale simulations of the cortico-cerebello-thalamic circuit using the supercomputer Fugaku.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121695712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed Algorithms for the Graph Biconnectivity and Least Common Ancestor Problems 图的双连通和最小共同祖先问题的分布式算法
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00187
Ian Bogle, George M. Slota
{"title":"Distributed Algorithms for the Graph Biconnectivity and Least Common Ancestor Problems","authors":"Ian Bogle, George M. Slota","doi":"10.1109/IPDPSW55747.2022.00187","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00187","url":null,"abstract":"Graph connectivity analysis is one of the primary ways to analyze the topological structure of social networks. Graph biconnectivity decompositions are of particular interest due to how they identify cut vertices and cut edges in a network. We present the first, to our knowledge, implementation of a distributed-memory parallel biconnectivity algorithm. As part of our algorithm, we also require the computation of least common ancestors (LCAs) of non-tree edge endpoints in a BFS tree. As such, we also propose a novel distributed algorithm for the LCA problem. Using our implementations, we observe up to a 14.8× speedup from 1 to 128 MPI ranks for computing a biconnectivity decomposition.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"8 Pt 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126270743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Modeling Power Consumption of Lossy Compressed I/O for Exascale HPC Systems 百亿亿级HPC系统有损压缩I/O功耗建模
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00184
Grant Wilkins, Jon C. Calhoun
{"title":"Modeling Power Consumption of Lossy Compressed I/O for Exascale HPC Systems","authors":"Grant Wilkins, Jon C. Calhoun","doi":"10.1109/IPDPSW55747.2022.00184","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00184","url":null,"abstract":"Exascale computing enables unprecedented, detailed and coupled scientific simulations which generate data on the order of tens of petabytes. Due to large data volumes, lossy compressors become indispensable as they enable better compression ratios and runtime performance than lossless compressors. Moreover, as (high-performance computing) HPC systems grow larger, they draw power on the scale of tens of megawatts. Data motion is expensive in time and energy. Therefore, optimizing compressor and data I/O power usage is an important step in reducing energy consumption to meet sustainable computing goals and stay within limited power budgets. In this paper, we explore efficient power consumption gains for the SZ and ZFP lossy compressors and data writing on a cloud HPC system while varying the CPU frequency, scientific data sets, and system architecture. Using this power consumption data, we construct a power model for lossy compression and present a tuning methodology that reduces energy overhead of lossy compressors and data writing on HPC systems by 14.3% on average. We apply our model and find 6.5 kJ s, or 13 %, of savings on average for 512GB I/O. Therefore, utilizing our model results in more energy efficient lossy data compression and I/O.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126935991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Methodology to Build Decision Analysis Tools Applied to Distributed Reinforcement Learning 一种构建用于分布式强化学习的决策分析工具的方法
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00173
Cèdric Prigent, Loïc Cudennec, Alexandru Costan, Gabriel Antoniu
{"title":"A Methodology to Build Decision Analysis Tools Applied to Distributed Reinforcement Learning","authors":"Cèdric Prigent, Loïc Cudennec, Alexandru Costan, Gabriel Antoniu","doi":"10.1109/IPDPSW55747.2022.00173","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00173","url":null,"abstract":"As Artificial Intelligence-based applications become more and more complex, speeding up the learning phase (which is typically computation-intensive) becomes more and more necessary. Distributed machine learning (ML) appears adequate to address this problem. Unfortunately, ML also brings new development frameworks, methodologies and high-level program-ming languages that do not fit to the regular high-performance computing design flow. This paper introduces a methodology to build a decision making tool that allows ML experts to arbitrate between different frameworks and deployment configurations, in order to fulfill project objectives such as the accuracy of the resulting model, the computing speed or the energy consumption of the learning computation. The proposed methodology is applied to an industrial-grade case study in which reinforcement learning is used to train an autonomous steering model for a cargo airdrop system. Results are presented within a Pareto front that lets ML experts choose an appropriate solution, a framework and a deployment configuration, based on the current operational situation. While the proposed approach can effortlessly be applied to other machine learning problems, as for many decision making systems, the selected solutions involve a trade-off between several antagonist evaluation criteria and require experts from different domains to pick the most efficient solution from the short list. Nevertheless, this methodology speeds up the development process by clearly discarding, or, on the contrary, including combinations of frameworks and configurations, which has a significant impact for time and budget-constrained projects.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125910431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Customizable Lightweight STM for Irregular Algorithms on GPU 针对GPU上不规则算法的可定制轻量级STM
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00098
Shayan Manoochehri, Patrick Cristofaro, D. Goswami
{"title":"A Customizable Lightweight STM for Irregular Algorithms on GPU","authors":"Shayan Manoochehri, Patrick Cristofaro, D. Goswami","doi":"10.1109/IPDPSW55747.2022.00098","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00098","url":null,"abstract":"Irregular algorithms are often encountered in highly data-centric application domains. These algorithms operate on irregular data structures such as sparse graphs with irregular access patterns, which may also modify the underlying topology unpredictably. High computational time and inherent data parallelism present in these algorithms motivate the use of GPUs for speeding things up, however there are challenges for their efficient implementations due to: difficulty in protecting the shared data consistency in the presence of concurrent dynamic transactions; irregular access patterns due to unstructured data structures; and dynamic structural modifications of the underlying topology. One approach to overcome these challenges is to use Software Transactional Memory (STM). However, overly complex design and implementations of contemporary STM-based approaches and lack of proper framework to employ them in conjunction with the irregular algorithms stalls their adoption by the programming community. To overcome some of these challenges, this research proposes a lightweight STM with a simple design (Lite GSTM), based on a lock stealing algorithm, and an associated extensible framework to hide the complexity of the STM from a programmer. The framework is extensible by allowing plug-ins of customized STMs designed for different needs of transactions. The use of the framework is elaborated with two use cases which employ completely different irregular algorithms, however, have some common features: the underlying data structure is a graph, and the graph is structurally modified (coarsened) unpredictably in the course of execution. The paper presents the performance comparisons of the STM-based implementations with respect to their sequential and non-STM based counterparts, which show promising results.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126558746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Parallelization of Programs via Software Stream Rewriting 通过软件流重写实现程序的自动并行化
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00094
Tao Tao, D. Plaisted
{"title":"Automatic Parallelization of Programs via Software Stream Rewriting","authors":"Tao Tao, D. Plaisted","doi":"10.1109/IPDPSW55747.2022.00094","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00094","url":null,"abstract":"We introduce a system for automatically paral-lelizing programs using a parallel-by-default language based on stream rewriting. Our method is general and supports all programs that can be written in a typical high-level, imperative language. The technique is fine-grained and fully automatic. It requires no programmer annotation, static analysis, runtime profiling, or cutoff schemes. The only assumption is that all function arguments in the input program can be executed in parallel. This does not affect the generality of our system since the programmers can write sequential parts in continuation-passing style. Experiments show that the runtime can scale computation-bound programs up to 16 cores without performance degradation. Future works remain to improve key aspects of the runtime and further increase the system's performance.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128130960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Towards a GraphBLAS Implementation for Go 面向Go的GraphBLAS实现
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00052
Pascal Costanza, I. Hur, T. Mattson
{"title":"Towards a GraphBLAS Implementation for Go","authors":"Pascal Costanza, I. Hur, T. Mattson","doi":"10.1109/IPDPSW55747.2022.00052","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00052","url":null,"abstract":"The GraphBLAS are building blocks for constructing graph algorithms as linear algebra. They are defined mathematically with the goal that they would eventually map onto a variety of programming languages. Today they exist in C, C++, Python, MATLAB®, and Julia. In this paper, we describe the GraphBLAS for the Go programming language. A particularly interesting aspect of this work is that using the concurrency features of the Go language, we aim to build a runtime system that uses the GraphBLAS nonblocking mode by default.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128218219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ReconOS64: A Hardware Operating System for Modern Platform FPGAs with 64-Bit Support ReconOS64:支持64位的现代平台fpga的硬件操作系统
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00029
L. Clausing, M. Platzner
{"title":"ReconOS64: A Hardware Operating System for Modern Platform FPGAs with 64-Bit Support","authors":"L. Clausing, M. Platzner","doi":"10.1109/IPDPSW55747.2022.00029","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00029","url":null,"abstract":"Reconfigurable hardware operating systems provide software-like abstractions for hardware accelerators. In particu-lar abstractions that view hardware accelerators as threads and integrate them into a multi-threaded environment have received popularity. However, such abstractions are not yet available for latest platform FPGAs. In this paper, we present ReconOS64, a reconfigurable hard-ware operating system for 64-Bit modern platform FPGAs. We discuss the architecture and the build flow and report on a number of experiments that evaluate the performance of the system. In particular, we compare the performance to a previous, 32- Bit ReconOS system. The evaluation shows that the step towards 64- Bit is not only necessary to make hardware operating system support available for modern platform FPGAs, but also improves the performance of operating system calls and memory accesses for hardware threads.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127263776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Modeling Memory Contention between Communications and Computations in Distributed HPC Systems 分布式高性能计算系统中通信与计算内存争用建模
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00086
Alexandre Denis, E. Jeannot, Philippe Swartvagher
{"title":"Modeling Memory Contention between Communications and Computations in Distributed HPC Systems","authors":"Alexandre Denis, E. Jeannot, Philippe Swartvagher","doi":"10.1109/IPDPSW55747.2022.00086","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00086","url":null,"abstract":"To amortize the cost of MPI communications, distributed parallel HPC applications can overlap network communications with computations in the hope that it improves global application performance. When using this technique, both computations and communications are running at the same time. But computation usually also performs some data movements. Since data for computations and for communications use the same memory system, memory contention may occur when computations are memory-bound and large messages are transmitted through the network at the same time. In this paper we propose a model to predict memory band-width for computations and for communications when they are executed side by side, according to data locality and taking contention into account. Elaboration of the model allowed to better understand locations of bottleneck in the memory system and what are the strategies of the memory system in case of contention. The model was evaluated on many platforms with different characteristics, and showed a prediction error in average lower than 4 %.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"83 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133784500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信