2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)最新文献

筛选
英文 中文
Optimizing Decentralized Learning with Local Heterogeneity using Topology Morphing and Clustering 基于拓扑变形和聚类的局部异构分散学习优化
Waqwoya Abebe, A. Jannesari
{"title":"Optimizing Decentralized Learning with Local Heterogeneity using Topology Morphing and Clustering","authors":"Waqwoya Abebe, A. Jannesari","doi":"10.1109/CCGrid57682.2023.00041","DOIUrl":"https://doi.org/10.1109/CCGrid57682.2023.00041","url":null,"abstract":"Recently, local peer topology has been shown to influence the overall convergence of decentralized learning (DL) graphs in the presence of data heterogeneity. In this paper, we demonstrate the advantages of constructing a proxy-based locally heterogeneous DL topology to enhance convergence and maintain data privacy. In particular, we propose a novel peer clumping strategy to efficiently cluster peers before arranging them in a final training graph. By showing how locally heterogeneous graphs outperform locally homogeneous graphs of similar size and from the same global data distribution, we present a strong case for topological pre-processing. Moreover, we demonstrate the scalability of our approach by showing how the proposed topological pre-processing overhead remains small in large graphs while the performance gains get even more pronounced. Furthermore, we show the robustness of our approach in the presence of network partitions.","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115345470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Optical Transceiver Reliability Study based on SFP Monitoring and OS-level Metric Data 基于SFP光模块监测和os级度量数据的光模块可靠性研究
Paolo Notaro, Qiao Yu, Soroush Haeri, Jorge Cardoso, M. Gerndt
{"title":"An Optical Transceiver Reliability Study based on SFP Monitoring and OS-level Metric Data","authors":"Paolo Notaro, Qiao Yu, Soroush Haeri, Jorge Cardoso, M. Gerndt","doi":"10.1109/CCGrid57682.2023.00011","DOIUrl":"https://doi.org/10.1109/CCGrid57682.2023.00011","url":null,"abstract":"The increasing demand for cloud computing drives the expansion in scale of datacenters and their internal optical network, in a strive for increasing bandwidth, high reliability, and lower latency. Optical transceivers are essential elements of optical networks, whose reliability has not been well-studied compared to other hardware components. In this paper, we leverage high quantities of monitoring data from optical transceivers and OS-level metrics to provide statistical insights about the occurrence of optical transceiver failures. We estimate transceiver failure rates and normal operating ranges for monitored attributes, correlate early-observable patterns to known failure symptoms, and finally develop failure prediction models based on our analyses. Our results enable network administrators to deploy early-warning systems and enact predictive maintenance strategies, such as replacement or traffic re-routing, reducing the number of incidents and their associated costs.","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131995517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
FreeTrain: A Framework to Utilize Unused Supercomputer Nodes for Training Neural Networks FreeTrain:利用未使用的超级计算机节点训练神经网络的框架
Zhengchun Liu, R. Kettimuthu, M. Papka, Ian T. Foster
{"title":"FreeTrain: A Framework to Utilize Unused Supercomputer Nodes for Training Neural Networks","authors":"Zhengchun Liu, R. Kettimuthu, M. Papka, Ian T. Foster","doi":"10.1109/CCGrid57682.2023.00036","DOIUrl":"https://doi.org/10.1109/CCGrid57682.2023.00036","url":null,"abstract":"Supercomputer scheduling policies commonly result in many transient idle nodes, a phenomenon that is only partially alleviated by backfill scheduling methods that promote small jobs to run before large jobs. Here we describe how to realize a novel use for these otherwise wasted resources, namely, deep neural network (DNN) training. This important workload is easily organized as many small fragments that can be configured dynamically to fit essentially any node × time hole in a supercomputer's schedule. We describe how the task of rescaling suitable DNN training tasks to fit dynamically changing holes can be formulated as a deterministic mixed integer linear programming (MILP)-based resource allocation algorithm, and show that this MILP problem can be solved efficiently at run time. We show further how this MILP problem can be adapted to optimize for administrator- or user-defined metrics. We validate our method with supercomputer scheduler logs and different DNN training scenarios, and demonstrate efficiencies of up to 93% compared with running the same training tasks on dedicated nodes. Our method thus enables substantial supercomputer resources to be allocated to DNN training with no impact on other applications.","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128359555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artifact Evaluation Committee Members 文物评估委员会成员
{"title":"Artifact Evaluation Committee Members","authors":"","doi":"10.1109/ccgrid57682.2023.00009","DOIUrl":"https://doi.org/10.1109/ccgrid57682.2023.00009","url":null,"abstract":"","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127030411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Asynchronous Dataflow-Driven Execution Model For Distributed Accelerator Computing 分布式加速器计算的异步数据流驱动执行模型
Philip Salzmann, Fabian Knorr, Peter Thoman, P. Gschwandtner, Biagio Cosenza, T. Fahringer
{"title":"An Asynchronous Dataflow-Driven Execution Model For Distributed Accelerator Computing","authors":"Philip Salzmann, Fabian Knorr, Peter Thoman, P. Gschwandtner, Biagio Cosenza, T. Fahringer","doi":"10.1109/CCGrid57682.2023.00018","DOIUrl":"https://doi.org/10.1109/CCGrid57682.2023.00018","url":null,"abstract":"While domain-specific HPC software packages continue to thrive and are vital to many scientific communities, a general purpose high-productivity GPU cluster programming model that facilitates experimentation for non-experts remains elusive. We demonstrate how Celerity, a high-level C++ programming model for distributed accelerator computing based on the open SYCL standard, allows for the quick development of - and experimentation with - distributed applications. To achieve scalability on large machines, we replace Celerity's existing master/worker scheduling model with a fully distributed scheme that reduces the worst-case scheduling complexity from quadratic to linear while maintaining the existing programming interface. We then show how this declarative, data-flow based API paired with a point-to-point communication model with eager data pushing can effectively expose and leverage opportunities for latency hiding and computation/communication overlapping with minimal or no manual guidance. We demonstrate how Celerity exhibits very good scalability on multiple benchmarks from several scientific domains and up to 128 GPUs.","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115599816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Blockchain Proportional Governance Reconfiguration: Mitigating a Governance Oligarchy 区块链比例治理重构:缓解治理寡头
Deepal Tennakoon, V. Gramoli
{"title":"Blockchain Proportional Governance Reconfiguration: Mitigating a Governance Oligarchy","authors":"Deepal Tennakoon, V. Gramoli","doi":"10.1109/CCGrid57682.2023.00057","DOIUrl":"https://doi.org/10.1109/CCGrid57682.2023.00057","url":null,"abstract":"Blockchain governance is paramount to lead securely a large group of users towards the same decisions without disputes about the legitimacy of a blockchain instance over another. As of today, there is no efficient way of protecting this governance against an oligarchy. This paper aims to offer a new dimension to the security of blockchains by proposing a solution known as proportional governance reconfiguration. This solution mitigates the formation of an oligarchy by (1) electing governors proportionally using a proportional multi-winner election protocol (2) reconfiguring the governance automatically and periodically. The proportional governance reconfiguration relies on a Solidity based implementation making it compatible and usable in many smart contract supported blockchains. We prove our solution solves the proportional governance problem and we evaluate our solution on two smart contract supporting blockchains Ethereum-PoA and Smart Redbelly Blockchain. Our results indicate that our proportional governance can elect 200 governors within 6–12 minutes when 1000 voters from 5 continents vote for 500 candidates.","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116179574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Use of Cost Surface Analysis and Stream Order Analysis for Computing Shortest Paths 利用成本面分析和流序分析计算最短路径
Yogesh Dasgaonkar
{"title":"Use of Cost Surface Analysis and Stream Order Analysis for Computing Shortest Paths","authors":"Yogesh Dasgaonkar","doi":"10.1109/CCGrid57682.2023.00067","DOIUrl":"https://doi.org/10.1109/CCGrid57682.2023.00067","url":null,"abstract":"We find that the current state-of-the-art shortest path navigation systems have a computational bottleneck that limits their scalability. To solve this problem, our first contribution is an important result showing that two points in the environment relate to each other by more geometric criteria than just the distances between them. Our second contribution shows that the environment's geometry is such that it allows for the points in the environment to be uniquely distinguishable based on the length of the shortest paths meeting at that point. Using this result, we order the points, so their ordering uniquely distinguishes the shortest path between any source and destination pair. Through these two important results, we propose a system that solves the computational bottleneck problem using lower processing resources and has higher optimal efficiency than the state-of-the-art.","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127160901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementing and Optimizing a GPU-aware MPI Library for Intel GPUs: Early Experiences 实现和优化gpu感知MPI库的英特尔gpu:早期的经验
Chen-Chun Chen, Kawthar Shafie Khorassani, Goutham Kalikrishna Reddy Kuncham, Rahul Vaidya, M. Abduljabbar, A. Shafi, H. Subramoni, D. Panda
{"title":"Implementing and Optimizing a GPU-aware MPI Library for Intel GPUs: Early Experiences","authors":"Chen-Chun Chen, Kawthar Shafie Khorassani, Goutham Kalikrishna Reddy Kuncham, Rahul Vaidya, M. Abduljabbar, A. Shafi, H. Subramoni, D. Panda","doi":"10.1109/CCGrid57682.2023.00022","DOIUrl":"https://doi.org/10.1109/CCGrid57682.2023.00022","url":null,"abstract":"As the demand for computing power from High-Performance Computing (HPC) and Deep Learning (DL) applications increase, there is a growing trend of equipping modern exascale clusters with accelerators, such as NVIDIA and AMD GPUs. GPU-aware MPI libraries allow the applications to communicate between GPUs in a parallel environment with high productivity and performance. Although NVIDIA and AMD GPUs have dominated the accelerator market for top supercomputers over the past several years, Intel has recently developed and released its GPUs and associated software stack, and provided a unified programming model to program their GPUs, referred to as oneAPI. The emergence of Intel GPUs drives the need for initial MPI-level GPU-aware support that utilizes the underlying software stack specific to these GPUs and a thorough evaluation of communication. In this paper, we propose a GPU-aware MPI library for Intel GPUs using oneAPI and an SYCL backend. We delve into our experiments using Intel GPUs and the challenges to consider at the MPI layer when adding GPU-aware support using the software stack provided by Intel for their GPUs. We explore different memory allocation approaches and benchmark the memory copy performance with Intel GPUs. We propose implementations based on our experiments on Intel GPUs to support point-to-point GPU-aware MPI operations and show the high adaptability of our approach by extending the implementations to MPI collective operations, such as MPI_Bcast and MPI_Reduce. We evaluate the benefits of our implementations at the benchmark level by extending support for Intel GPU buffers over OSU Micro-Benchmarks. Our implementations provide up to 1.8x and 2.2x speedups on point-to-point latency using device buffers at small messages compared to Intel MPI and a naive benchmark, respectively; and have up to 1.3x and 1.5x speedups at large message sizes. At collective MPI operations, our implementations show 8x and 5x speedups for MPI_Allreduce and MPI_Allgather at large messages. At the application-level evaluation, our implementations provide up to 40% improvement for 3DStencil compared to Intel MPI.","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116442319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Accelerating Hybrid DFT Simulations Using Performance Modeling on Supercomputers 在超级计算机上使用性能建模加速混合DFT仿真
Yosuke Oyama, Takumi Honda, Atsushi Ishikawa, Koichi Shirahata
{"title":"Accelerating Hybrid DFT Simulations Using Performance Modeling on Supercomputers","authors":"Yosuke Oyama, Takumi Honda, Atsushi Ishikawa, Koichi Shirahata","doi":"10.1109/ccgrid57682.2023.00055","DOIUrl":"https://doi.org/10.1109/ccgrid57682.2023.00055","url":null,"abstract":"Density Functional Theory (DFT) is an electronic-structure theory that computes the electronic energy of atoms and molecules from their electron density. Among several DFT methods, one called “hybrid DFT” adds the Hartree-Fock exchange energy to the original DFT exchange energy, and it improves the accuracy of the estimation of energy. However, this introduces additional computational costs, preventing its wide application for large-scale calculations. In light of those issues, a performance model to tune the computational configurations for hybrid DFT software automatically is proposed. The proposed model makes it possible to exhaustively search for parameters to minimize computation time without having to execute actual calculations with all parameter combinations. Several techniques for optimizing hybrid DFT, specially designed for the Fugaku supercomputer, are also proposed. It is concluded that combining all approaches reduces node-time cost by 2.23x and 2.68x for a 52-atom input on Fugaku and ABCI, respectively.","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133483144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chronica: A Data-Imbalance-Aware Scheduler for Distributed Deep Learning Chronica:分布式深度学习的数据不平衡感知调度器
Sanha Maeng, G. Moon, Sungyong Park
{"title":"Chronica: A Data-Imbalance-Aware Scheduler for Distributed Deep Learning","authors":"Sanha Maeng, G. Moon, Sungyong Park","doi":"10.1109/CCGrid57682.2023.00033","DOIUrl":"https://doi.org/10.1109/CCGrid57682.2023.00033","url":null,"abstract":"One of the major challenges in distributed deep learning is attenuating straggler problem. The straggler increases synchronization latency and significantly inhibits the convergence of deep learning model. We empirically observe that the imbal-anced data samples worsen the straggler problem and make the convergence of the deep learning model slower. However, existing approaches such as BOA and EP4DDL have not addressed data imbalance issues while solving the straggler problem. To overcome the straggler and data imbalance problems, we propose Chronica,a new data-imbalance-aware scheduler. Based on the size of the data samples and the configuration of each worker, Chronicaelaborately predicts the training time required for each worker. Chronicathen provides equivalent training time to each of the workers, alleviating both step- and epoch-level straggler problems. Furthermore, Chronicasuggests a new parameter synchronization scheme to achieve fast convergence based on the weighted average of the training workload on each worker. Our extensive evaluation using four deep learning models on 32 Amazon EC2 GPU instances showed that the new Chronicaachieves up to 3.19 times speedup over the state-of-the-art systems.","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131896820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信