2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)最新文献_第6页

Harnessing the Crowd for Autotuning High-Performance Computing Applications 利用人群自动调优高性能计算应用程序

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2023-05-01 DOI: 10.1109/IPDPS54959.2023.00069

Younghyun Cho, J. Demmel, Jacob King, X. Li, Yang Liu, Hengrui Luo

引用次数: 3

Satellite Collision Detection using Spatial Data Structures 基于空间数据结构的卫星碰撞检测

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2023-05-01 DOI: 10.1109/IPDPS54959.2023.00078

C. Hellwig, Fabian Czappa, Martin Michel, R. Bertrand, F. Wolf

{"title":"Satellite Collision Detection using Spatial Data Structures","authors":"C. Hellwig, Fabian Czappa, Martin Michel, R. Bertrand, F. Wolf","doi":"10.1109/IPDPS54959.2023.00078","DOIUrl":"https://doi.org/10.1109/IPDPS54959.2023.00078","url":null,"abstract":"In recent years, the number of artificial objects in Earth orbit has increased rapidly due to lower launch costs and new applications for satellites. More and more governments and private companies are discovering space for their own purposes. Private companies are using space as a new business field, launching thousands of satellites into orbit to offer services like worldwide Internet access. Consequently, the probability of collisions and, thus, the degradation of the orbital environment is rapidly increasing. To avoid devastating collisions at an early stage, efficient algorithms are required to identify satellites approaching each other. Traditional deterministic filter-based conjunction detection algorithms compare each satellite to every other satellite and pass them through a chain of orbital filters. Unfortunately, this leads to a runtime complexity of O(n2). In this paper, we propose two alternative approaches that rely on spatial data structures and thus allow us to exploit modern hardware’s parallelism efficiently. Firstly, we introduce a purely grid-based variant that relies on non-blocking atomic hash maps to identify conjunctions. Secondly, we present a hybrid method that combines this approach with traditional filter chains. Both implementations make it possible to identify conjunctions in a large population with millions of satellites with high precision in a comparatively short time. While the grid-based variant is characterized by lower memory consumption, the hybrid variant is faster if enough memory is available.","PeriodicalId":343684,"journal":{"name":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129819452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimizing Cloud Computing Resource Usage for Hemodynamic Simulation 优化血流动力学模拟的云计算资源使用

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2023-05-01 DOI: 10.1109/IPDPS54959.2023.00063

William Ladd, Christopher W Jensen, M. Vardhan, Jeff Ames, J. Hammond, E. Draeger, A. Randles

{"title":"Optimizing Cloud Computing Resource Usage for Hemodynamic Simulation","authors":"William Ladd, Christopher W Jensen, M. Vardhan, Jeff Ames, J. Hammond, E. Draeger, A. Randles","doi":"10.1109/IPDPS54959.2023.00063","DOIUrl":"https://doi.org/10.1109/IPDPS54959.2023.00063","url":null,"abstract":"Cloud computing resources are becoming an increasingly attractive option for simulation workflows but require users to assess a wider variety of hardware options and associated costs than required by traditional in-house hardware or fixed allocations at leadership computing facilities. The pay-as-you-go model used by cloud providers gives users the opportunity to make more nuanced cost-benefit decisions at runtime by choosing hardware that best matches a given workload, but creates the risk of suboptimal allocation strategies or inadvertent cost overruns. In this work, we propose the use of an iteratively-refined performance model to optimize cloud simulation campaigns against overall cost, throughput, or maximum time to solution. Hemodynamic simulations represent an excellent use case for these assessments, as the relative costs and dominant terms in the performance model can vary widely with hardware, numerical parameters and physics models. Performance and scaling behavior of hemodynamic simulations on multiple cloud services as well as a traditional compute cluster are collected and evaluated, and an initial performance model is proposed along with a strategy for dynamically refining it with additional experimental data.","PeriodicalId":343684,"journal":{"name":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130813785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Porting a Computational Fluid Dynamics Code with AMR to Large-scale GPU Platforms 基于AMR的计算流体动力学代码移植到大规模GPU平台

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2023-05-01 DOI: 10.1109/IPDPS54959.2023.00066

J. H. Davis, Justin Shafner, Daniel Nichols, N. Grube, P. Martin, A. Bhatele

引用次数: 0

FedBIAD: Communication-Efficient and Accuracy-Guaranteed Federated Learning with Bayesian Inference-Based Adaptive Dropout 基于贝叶斯推理的自适应退出的高效通信和保证准确性的联邦学习

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2023-05-01 DOI: 10.1109/IPDPS54959.2023.00056

Jingjing Xue, Min Liu, Sheng Sun, Yuwei Wang, Hui Jiang, Xue Jiang

{"title":"FedBIAD: Communication-Efficient and Accuracy-Guaranteed Federated Learning with Bayesian Inference-Based Adaptive Dropout","authors":"Jingjing Xue, Min Liu, Sheng Sun, Yuwei Wang, Hui Jiang, Xue Jiang","doi":"10.1109/IPDPS54959.2023.00056","DOIUrl":"https://doi.org/10.1109/IPDPS54959.2023.00056","url":null,"abstract":"Federated Learning (FL) emerges as a distributed machine learning paradigm without end-user data transmission, effectively avoiding privacy leakage. Participating devices in FL are usually bandwidth-constrained, and the uplink is much slower than the downlink in wireless networks, which causes a severe uplink communication bottleneck. A prominent direction to alleviate this problem is federated dropout, which drops fractional weights of local models. However, existing federated dropout studies focus on random or ordered dropout and lack theoretical support, resulting in unguaranteed performance. In this paper, we propose Federated learning with Bayesian Inference-based Adaptive Dropout (FedBIAD), which regards weight rows of local models as probability distributions and adaptively drops partial weight rows based on importance indicators correlated with the trend of local training loss. By applying FedBIAD, each client adaptively selects a high-quality dropping pattern with accurate approximations and only transmits parameters of non-dropped weight rows to mitigate uplink costs while improving accuracy. Theoretical analysis demonstrates that the convergence rate of the average generalization error of FedBIAD is minimax optimal up to a squared logarithmic factor. Extensive experiments on image classification and next-word prediction show that compared with status quo approaches, FedBIAD provides 2× uplink reduction with an accuracy increase of up to 2.41% even on non-Independent and Identically Distributed (non-IID) data, which brings up to 72% decrease in training time.","PeriodicalId":343684,"journal":{"name":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129357731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FIRST: Exploiting the Multi-Dimensional Attributes of Functions for Power-Aware Serverless Computing 第一:利用功能的多维属性进行无服务器计算

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2023-05-01 DOI: 10.1109/IPDPS54959.2023.00091

Lu Zhang, C. Li, Xinkai Wang, Weiqi Feng, Zheng Yu, Quan Chen, Jingwen Leng, Minyi Guo, Pu Yang, Shang Yue

{"title":"FIRST: Exploiting the Multi-Dimensional Attributes of Functions for Power-Aware Serverless Computing","authors":"Lu Zhang, C. Li, Xinkai Wang, Weiqi Feng, Zheng Yu, Quan Chen, Jingwen Leng, Minyi Guo, Pu Yang, Shang Yue","doi":"10.1109/IPDPS54959.2023.00091","DOIUrl":"https://doi.org/10.1109/IPDPS54959.2023.00091","url":null,"abstract":"Emerging cloud-native development models raise new challenges for managing server performance and power at microsecond scale. Compared with traditional cloud workloads, serverless functions exhibit unprecedented heterogeneity, variability, and dynamicity. Designing cloud-native power management schemes for serverless functions requires significant engineering effort. Current solutions remain sub-optimal since their orchestration process is often one-sided, lacking a systematic view. A key obstacle to truly efficient function deployment is the fundamental wide abstraction gap between the upper-layer request scheduling and the low-level hardware execution.In this work, we show that the optimal operating point (OOP) for energy efficiency cannot be attained without synthesizing the multi-dimensional attributes of functions. We present FIRST, a novel mechanism that enables servers to better orchestrate serverless functions. The key feature of FIRST is that it leverages a lightweight Internal Representation and meta-Scheduling (IRS) layer for collecting the maximum potential revenue from the servers. Specifically, FIRST follows a pipeline-style workflow. Its frontend components aim to analyze functions from different angles and expose their key features to the system. Meanwhile, its backend components are able to make informed function assignment decisions to avoid OOP divergence. We further demonstrate the way to create extensions based on FIRST to enable versatile cloud-native power management. In total, our design constitutes a flexible management layer that supports power-aware function deployment. We show that FIRST could allow 94% functions to be processed under the OOP, which brings up to 24% energy efficiency improvements.","PeriodicalId":343684,"journal":{"name":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125484185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

SelB-k-NN: A Mini-Batch K-Nearest Neighbors Algorithm on AI Processors SelB-k-NN: AI处理器上的一种小批k近邻算法

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2023-05-01 DOI: 10.1109/IPDPS54959.2023.00088

Yifeng Tang, Cho-Li Wang

{"title":"SelB-k-NN: A Mini-Batch K-Nearest Neighbors Algorithm on AI Processors","authors":"Yifeng Tang, Cho-Li Wang","doi":"10.1109/IPDPS54959.2023.00088","DOIUrl":"https://doi.org/10.1109/IPDPS54959.2023.00088","url":null,"abstract":"The popularity of Artificial Intelligence (AI) motivates novel domain-specific hardware named AI processors. With a design trade-off, the AI processors feature incredible computation power for matrix multiplications and activations, while some leave other operations less powerful, e.g., scalar operations and vectorized comparisons & selections. For k-nearest neighbors (k-NN) algorithm, consisting of distance computation phase and k-selection phase, while the former is naturally accelerated, the previous efficient k-selection becomes problematic. Moreover, limited memory forces k-NN to adopt a mini-batch manner with tiling technique. As the distance computation’s results are the k-selection’s inputs, the former’s tiling shape determines that of the latter. Since the two phases execute on separate hardware units requiring different performance analyses, whether the former’s tiling strategies benefit the latter and entire k-NN is doubtful.To address the new challenges brought by the AI processors, this paper proposes SelB-k-NN (Selection-Bitonic-k-NN), a mini-batch algorithm inspired by selection sort and bitonic k-selection. SelB-k-NN avoids the expansion of the weakly-supported operations on the huge scale of datasets. To apply SelB-k-NN to various AI processors, we propose two algorithms to reduce the hardware support requirements. Since the matrix multiplication operates data with the specifically-designed memory hierarchy which k-selection does not share, the tiling shape of the former cannot guarantee the best execution of the latter and vice versa. By quantifying the runtime workload variations of k-selection, we formulate an optimization problem to search for the optimal tiling shapes of both phases with an offline pruning method, which reduces the search space in the preprocessing stage. Evaluations show that on Huawei Ascend 310 AI processor, SelB-k-NN achieves 2.01× speedup of the bitonic k-selection, 23.93× of the heap approach, 78.52× of the CPU approach. For mini-batch SelB-k-NN, the optimal tiling shapes for two phases respectively achieve 1.48× acceleration compared with the matrix multiplication tiling shapes and 1.14× with the k-selection tiling shapes, with 72.80% of the search space pruned.","PeriodicalId":343684,"journal":{"name":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117286368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Software-Defined, Fast and Strongly-Consistent Data Replication for RDMA-Based PM Datastores 基于rdma的PM数据存储的软件定义、快速和强一致的数据复制

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2023-05-01 DOI: 10.1109/IPDPS54959.2023.00019

Haodi Lu, Haikun Liu, Chencheng Ye, Xiaofei Liao, Fubing Mao, Yu Zhang, Hai Jin

{"title":"Software-Defined, Fast and Strongly-Consistent Data Replication for RDMA-Based PM Datastores","authors":"Haodi Lu, Haikun Liu, Chencheng Ye, Xiaofei Liao, Fubing Mao, Yu Zhang, Hai Jin","doi":"10.1109/IPDPS54959.2023.00019","DOIUrl":"https://doi.org/10.1109/IPDPS54959.2023.00019","url":null,"abstract":"Modern storage systems typically replicate data on multiple servers to provide high reliability and availability. However, most commercially-deployed datastores often fail to offer low latency, high throughput, and strong consistency at the same time. This paper presents Whale, a Remote Direct Memory Access (RDMA) based primary-backup replication system for in-memory datastores. Whale achieves both low latency and strong consistency by decoupling metadata multicasting from data replication for all backup nodes, and using an optimistic commitment mechanism to respond to client write requests earlier. Whale achieves high throughput by propagating writes from the primary node to backup nodes asynchronously via RDMA-optimized chain replication. To further reduce the cost of data replication, we design a log-structured datastore to fully exploit the advantages of one-sided RDMA and Persistent Memory (PM). We implement Whale on a cluster equipped with PM and InfiniBand RDMA networks. Experimental results show that Whale achieves much higher throughput and lower latency than state-of-the-art replication protocols.","PeriodicalId":343684,"journal":{"name":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121540927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Message from the IPDPS 2023 Program Chairs 来自IPDPS 2023项目主席的信息

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2023-05-01 DOI: 10.1109/ipdps54959.2023.00006

引用次数: 0

A Guaranteed Approximation Algorithm for Scheduling Fork-Joins with Communication Delay 考虑通信延迟的分叉连接调度的保证逼近算法

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2023-05-01 DOI: 10.1109/IPDPS54959.2023.00087

P. Dutot, Yeu-Shin Fu, Nikhil Prasad, O. Sinnen

{"title":"A Guaranteed Approximation Algorithm for Scheduling Fork-Joins with Communication Delay","authors":"P. Dutot, Yeu-Shin Fu, Nikhil Prasad, O. Sinnen","doi":"10.1109/IPDPS54959.2023.00087","DOIUrl":"https://doi.org/10.1109/IPDPS54959.2023.00087","url":null,"abstract":"Scheduling task graphs with communication delay is a widely studied NP-hard problem. Many heuristics have been proposed, but there is no constant approximation algorithm for this classic model. In this paper, we focus on the scheduling of the important class of fork-join task graphs (describing many types of common computations) on homogeneous processors. For this sub-case, we propose a guaranteed algorithm with a $left( {1 + frac{m}{{m - 1}}} right)$-approximation factor, where m is the number of processors. The algorithm is not only the first constant approximation for an important sub-domain of the classic scheduling problem, it is also a practical algorithm that can obtain shorter makespans than known heuristics. To demonstrate this, we propose adaptations of known scheduling heuristic for the specific fork-join structure. In an extensive evaluation, we then implemented these algorithms and scheduled many fork-join graphs with up to thousands of tasks and various computation time distributions on up to hundreds of processors. Comparing the obtained results demonstrates the competitive nature of the proposed approximation algorithm.","PeriodicalId":343684,"journal":{"name":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116472569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0