GraphCL:在多设备平台上执行数据流图的框架

2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP) Pub Date : 2022-03-01 DOI:10.1109/pdp55904.2022.00026

Konrad Moren, D. Göhringer

{"title":"GraphCL:在多设备平台上执行数据流图的框架","authors":"Konrad Moren, D. Göhringer","doi":"10.1109/pdp55904.2022.00026","DOIUrl":null,"url":null,"abstract":"This article introduces GraphCL, an automated system for seamlessly mapping multi-kernel applications to multiple computing devices. GraphCL consists of a C ++ API and a runtime that abstracts and simplifies the execution of multi-kernel applications on heterogeneous platforms across multiple devices. The GraphCL approach has three steps. First, the application designer provides a kernel graph. In the second phase, GraphCL computes the execution schedule. After the schedule has been computed, the runtime uses the execution schedule to enqueue in parallel the processing for all system processors. GraphCL takes the kernel dependencies and the processor performance differences into account during the schedule calculation process. By deciding on the schedule, GraphCL transparently manages the order of execution and data transfers for each processor. On two asymmetric workstations, GraphCL achieves an average acceleration of 1.8x compared to the fastest device. GraphCL achieves also for the set of multi-kernel benchmarks an average 24.5% energy reduction compared to the lazy partition heuristic, that uses all the system processors without considering their power usage.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"GraphCL: A Framework for Execution of Data-Flow Graphs on Multi-Device Platforms\",\"authors\":\"Konrad Moren, D. Göhringer\",\"doi\":\"10.1109/pdp55904.2022.00026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article introduces GraphCL, an automated system for seamlessly mapping multi-kernel applications to multiple computing devices. GraphCL consists of a C ++ API and a runtime that abstracts and simplifies the execution of multi-kernel applications on heterogeneous platforms across multiple devices. The GraphCL approach has three steps. First, the application designer provides a kernel graph. In the second phase, GraphCL computes the execution schedule. After the schedule has been computed, the runtime uses the execution schedule to enqueue in parallel the processing for all system processors. GraphCL takes the kernel dependencies and the processor performance differences into account during the schedule calculation process. By deciding on the schedule, GraphCL transparently manages the order of execution and data transfers for each processor. On two asymmetric workstations, GraphCL achieves an average acceleration of 1.8x compared to the fastest device. GraphCL achieves also for the set of multi-kernel benchmarks an average 24.5% energy reduction compared to the lazy partition heuristic, that uses all the system processors without considering their power usage.\",\"PeriodicalId\":210759,\"journal\":{\"name\":\"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/pdp55904.2022.00026\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/pdp55904.2022.00026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

本文介绍GraphCL，这是一个用于将多内核应用程序无缝映射到多个计算设备的自动化系统。GraphCL由一个c++ API和一个运行时组成，该运行时抽象并简化了跨多个设备的异构平台上的多内核应用程序的执行。GraphCL方法有三个步骤。首先，应用程序设计器提供一个内核图。在第二个阶段，GraphCL计算执行计划。计算完调度后，运行时使用执行调度并行地为所有系统处理器的处理排队。在调度计算过程中，GraphCL考虑了内核依赖关系和处理器性能差异。通过决定调度，GraphCL透明地管理每个处理器的执行顺序和数据传输。在两个非对称工作站上，与最快的设备相比，GraphCL实现了1.8倍的平均加速。与惰性分区启发式方法相比，GraphCL还实现了多内核基准测试集平均24.5%的能耗降低，惰性分区启发式方法使用所有系统处理器而不考虑它们的功耗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GraphCL: A Framework for Execution of Data-Flow Graphs on Multi-Device Platforms

This article introduces GraphCL, an automated system for seamlessly mapping multi-kernel applications to multiple computing devices. GraphCL consists of a C ++ API and a runtime that abstracts and simplifies the execution of multi-kernel applications on heterogeneous platforms across multiple devices. The GraphCL approach has three steps. First, the application designer provides a kernel graph. In the second phase, GraphCL computes the execution schedule. After the schedule has been computed, the runtime uses the execution schedule to enqueue in parallel the processing for all system processors. GraphCL takes the kernel dependencies and the processor performance differences into account during the schedule calculation process. By deciding on the schedule, GraphCL transparently manages the order of execution and data transfers for each processor. On two asymmetric workstations, GraphCL achieves an average acceleration of 1.8x compared to the fastest device. GraphCL achieves also for the set of multi-kernel benchmarks an average 24.5% energy reduction compared to the lazy partition heuristic, that uses all the system processors without considering their power usage.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)

自引率

0.00%

发文量