多核cpu和gpu上Simulink模型基于模型的并行化

2019 International SoC Design Conference (ISOCC) Pub Date : 2019-10-06 DOI:10.1109/ISOCC47750.2019.9078489

Zhaoqian Zhong, M. Edahiro

{"title":"多核cpu和gpu上Simulink模型基于模型的并行化","authors":"Zhaoqian Zhong, M. Edahiro","doi":"10.1109/ISOCC47750.2019.9078489","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a model-based approach to parallelize Simulink models on multicore CPUs and NVIDIA GPUs at the block level and generate CUDA C codes for parallel execution. In our proposed approach, the Simulink models are converted to directed acyclic graphs (DAGs) based on their block diagrams, wherein the nodes represent tasks of grouped blocks in the model and the edges represent the communication behaviors between blocks. Next, a path analysis is conducted on the DAGs to extract all execution paths and calculate the length of each path, which comprises the execution times of tasks and the communication times of edges on the path. Then, an integer linear programming (ILP) formulation is used to minimize the length of the critical path of the DAG, which represents the execution time of the Simulink model. The ILP formulation also balances the workloads on each CPU core for optimized hardware utilization. We evaluate the proposed approach by parallelizing an image processing model on a platform of two homogeneous CPU cores and two GPUs to determine its effectiveness.","PeriodicalId":113802,"journal":{"name":"2019 International SoC Design Conference (ISOCC)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Model-based Parallelization for Simulink Models on Multicore CPUs and GPUs\",\"authors\":\"Zhaoqian Zhong, M. Edahiro\",\"doi\":\"10.1109/ISOCC47750.2019.9078489\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a model-based approach to parallelize Simulink models on multicore CPUs and NVIDIA GPUs at the block level and generate CUDA C codes for parallel execution. In our proposed approach, the Simulink models are converted to directed acyclic graphs (DAGs) based on their block diagrams, wherein the nodes represent tasks of grouped blocks in the model and the edges represent the communication behaviors between blocks. Next, a path analysis is conducted on the DAGs to extract all execution paths and calculate the length of each path, which comprises the execution times of tasks and the communication times of edges on the path. Then, an integer linear programming (ILP) formulation is used to minimize the length of the critical path of the DAG, which represents the execution time of the Simulink model. The ILP formulation also balances the workloads on each CPU core for optimized hardware utilization. We evaluate the proposed approach by parallelizing an image processing model on a platform of two homogeneous CPU cores and two GPUs to determine its effectiveness.\",\"PeriodicalId\":113802,\"journal\":{\"name\":\"2019 International SoC Design Conference (ISOCC)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International SoC Design Conference (ISOCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISOCC47750.2019.9078489\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International SoC Design Conference (ISOCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISOCC47750.2019.9078489","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

在本文中，我们提出了一种基于模型的方法，在块级上并行化多核cpu和NVIDIA gpu上的Simulink模型，并生成并行执行的CUDA C代码。在我们提出的方法中，将Simulink模型转换为基于其框图的有向无环图(dag)，其中节点表示模型中分组块的任务，边表示块之间的通信行为。然后对dag进行路径分析，提取所有的执行路径，并计算每条路径的长度，包括任务的执行次数和路径上边的通信次数。然后，使用整数线性规划(ILP)公式来最小化DAG关键路径的长度，该长度表示Simulink模型的执行时间。ILP公式还平衡每个CPU核心上的工作负载，以优化硬件利用率。我们通过在两个同构CPU内核和两个gpu的平台上并行化图像处理模型来评估所提出的方法，以确定其有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Model-based Parallelization for Simulink Models on Multicore CPUs and GPUs

In this paper, we propose a model-based approach to parallelize Simulink models on multicore CPUs and NVIDIA GPUs at the block level and generate CUDA C codes for parallel execution. In our proposed approach, the Simulink models are converted to directed acyclic graphs (DAGs) based on their block diagrams, wherein the nodes represent tasks of grouped blocks in the model and the edges represent the communication behaviors between blocks. Next, a path analysis is conducted on the DAGs to extract all execution paths and calculate the length of each path, which comprises the execution times of tasks and the communication times of edges on the path. Then, an integer linear programming (ILP) formulation is used to minimize the length of the critical path of the DAG, which represents the execution time of the Simulink model. The ILP formulation also balances the workloads on each CPU core for optimized hardware utilization. We evaluate the proposed approach by parallelizing an image processing model on a platform of two homogeneous CPU cores and two GPUs to determine its effectiveness.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 International SoC Design Conference (ISOCC)

自引率

0.00%

发文量