Measuring Thread Timing to Assess the Feasibility of Early-Bird Message Delivery Across Systems and Scales

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Concurrency and Computation-Practice & Experience Pub Date : 2024-12-12 DOI:10.1002/cpe.8342

W. Pepper Marts, Matthew G. F. Dosanjh, Whit Schonbein, Scott Levy, Patrick G. Bridges

{"title":"Measuring Thread Timing to Assess the Feasibility of Early-Bird Message Delivery Across Systems and Scales","authors":"W. Pepper Marts, Matthew G. F. Dosanjh, Whit Schonbein, Scott Levy, Patrick G. Bridges","doi":"10.1002/cpe.8342","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Early-bird communication is a communication/computation overlap technique that leverages fine-grained communication to improve application run-time. Communication is divided such that each individual thread can initiate transmission of its portion of the data upon completion rather than waiting for a dedicated communication phase. The benefit of early-bird communication depends on the completion timing of the individual threads: On the one hand, if all threads are complete at nearly the same time, the overheads of sending multiple messages will accumulate, leading to performance that is worse than if a single message had been sent. On the other hand, if thread completions are spread out in time, those that complete earlier can send data while others continue working, leading to performance that is better than if a single message had been sent. The challenge is that the completion times are currently unknown and can vary based on application, problem size, system software, and underlying hardware. In this paper, we address this lacuna by measuring and evaluating the potential overlap afforded by early-bird communication for a selection of proxy applications. These measurements help us understand whether a given application could benefit from early-bird communication. We present our technique for gathering this data and evaluate data collected from three proxy applications: MiniFE, MiniMD, and MiniQMC. Each application is run on three systems with distinct CPU architectures and strong scales across three run sizes. To characterize the behavior of these workloads, we study the trends of thread timings at both a macro level, across all threads across all runs of an application, and a micro level, that is, within a single process of a single run. We observe that our tested applications exhibit significantly different thread arrival distributions. The machine used had a significant impact, with the window of potential overlap varying by as much as an order of magnitude.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.8342","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Early-bird communication is a communication/computation overlap technique that leverages fine-grained communication to improve application run-time. Communication is divided such that each individual thread can initiate transmission of its portion of the data upon completion rather than waiting for a dedicated communication phase. The benefit of early-bird communication depends on the completion timing of the individual threads: On the one hand, if all threads are complete at nearly the same time, the overheads of sending multiple messages will accumulate, leading to performance that is worse than if a single message had been sent. On the other hand, if thread completions are spread out in time, those that complete earlier can send data while others continue working, leading to performance that is better than if a single message had been sent. The challenge is that the completion times are currently unknown and can vary based on application, problem size, system software, and underlying hardware. In this paper, we address this lacuna by measuring and evaluating the potential overlap afforded by early-bird communication for a selection of proxy applications. These measurements help us understand whether a given application could benefit from early-bird communication. We present our technique for gathering this data and evaluate data collected from three proxy applications: MiniFE, MiniMD, and MiniQMC. Each application is run on three systems with distinct CPU architectures and strong scales across three run sizes. To characterize the behavior of these workloads, we study the trends of thread timings at both a macro level, across all threads across all runs of an application, and a micro level, that is, within a single process of a single run. We observe that our tested applications exhibit significantly different thread arrival distributions. The machine used had a significant impact, with the window of potential overlap varying by as much as an order of magnitude.

查看原文本刊更多论文

测量线程时间以评估跨系统和规模的早鸟消息传递的可行性

早鸟通信是一种通信/计算重叠技术，它利用细粒度通信来改进应用程序运行时。通信被划分为这样，每个单独的线程可以在完成时启动其数据部分的传输，而不是等待专用的通信阶段。早鸟通信的好处取决于各个线程的完成时间：一方面，如果所有线程几乎同时完成，则发送多个消息的开销将累积，导致性能比发送单个消息更差。另一方面，如果线程按时完成，则较早完成的线程可以在其他线程继续工作时发送数据，从而获得比发送单个消息更好的性能。挑战在于完成时间目前是未知的，并且可能根据应用程序、问题大小、系统软件和底层硬件而变化。在本文中，我们通过测量和评估为代理应用程序选择的早鸟通信提供的潜在重叠来解决这一空白。这些测量可以帮助我们了解一个给定的应用程序是否可以从早鸟通信中受益。我们介绍了收集这些数据的技术，并评估了从三个代理应用程序（MiniFE、MiniMD和MiniQMC）收集的数据。每个应用程序都在三个系统上运行，这些系统具有不同的CPU架构，并且可以跨三个运行大小进行强大的扩展。为了描述这些工作负载的行为特征，我们在宏观层面（跨应用程序的所有运行的所有线程）和微观层面（即在单个运行的单个进程内）研究线程计时的趋势。我们观察到，我们测试的应用程序显示出明显不同的线程到达分布。所使用的机器有重大影响，潜在重叠的窗口变化多达一个数量级。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Concurrency and Computation-Practice & Experience 工程技术-计算机：理论方法

CiteScore

5.00

自引率

10.00%

发文量

664

审稿时长

9.6 months

期刊介绍： Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of: Parallel and distributed computing; High-performance computing; Computational and data science; Artificial intelligence and machine learning; Big data applications, algorithms, and systems; Network science; Ontologies and semantics; Security and privacy; Cloud/edge/fog computing; Green computing; and Quantum computing.