H. Yviquel, E. Casseau, M. Raulet, P. Jääskeläinen, J. Takala
{"title":"Towards run-time actor mapping of dynamic dataflow programs onto multi-core platforms","authors":"H. Yviquel, E. Casseau, M. Raulet, P. Jääskeläinen, J. Takala","doi":"10.1109/ISPA.2013.6703834","DOIUrl":null,"url":null,"abstract":"The emergence of massively parallel architectures, along with the necessity of new parallel programming models, has revived the interest on dataflow programming due to its ability to express concurrency. Although dynamic dataflow programming can be considered as a flexible approach for the development of scalable applications, there are still some open problems in concern of their execution. In this paper, we propose a low-cost mapping methodology to map dynamic dataflow programs over any multi-core platform. Our approach finds interesting mapping solutions in few milliseconds that makes it doable at regular time by translating it in an equivalent graph partitioning problem. Consequently, a good load balancing over the targeted platform can be maintained even with such unpredictable applications. We conduct experiments across three MPEG video decoders, including one based on the new High Efficiency Video Coding standard. Those dataflow-based video decoders are executed on two different platform: A desktop multi-core processor, and an embedded platform composed of interconnected and tiny Very Long Instruction Word - style processors. Our entire design flow is based on open-source tools. We present the influence of the number of processors on the performance and show that our method obtains a maximum decoding rate for 16 processors.","PeriodicalId":425029,"journal":{"name":"2013 8th International Symposium on Image and Signal Processing and Analysis (ISPA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 8th International Symposium on Image and Signal Processing and Analysis (ISPA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPA.2013.6703834","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20
Abstract
The emergence of massively parallel architectures, along with the necessity of new parallel programming models, has revived the interest on dataflow programming due to its ability to express concurrency. Although dynamic dataflow programming can be considered as a flexible approach for the development of scalable applications, there are still some open problems in concern of their execution. In this paper, we propose a low-cost mapping methodology to map dynamic dataflow programs over any multi-core platform. Our approach finds interesting mapping solutions in few milliseconds that makes it doable at regular time by translating it in an equivalent graph partitioning problem. Consequently, a good load balancing over the targeted platform can be maintained even with such unpredictable applications. We conduct experiments across three MPEG video decoders, including one based on the new High Efficiency Video Coding standard. Those dataflow-based video decoders are executed on two different platform: A desktop multi-core processor, and an embedded platform composed of interconnected and tiny Very Long Instruction Word - style processors. Our entire design flow is based on open-source tools. We present the influence of the number of processors on the performance and show that our method obtains a maximum decoding rate for 16 processors.
大规模并行架构的出现,以及新的并行编程模型的必要性,重新引起了人们对数据流编程的兴趣,因为它能够表达并发性。尽管动态数据流编程可以被认为是开发可扩展应用程序的一种灵活方法,但是在执行方面仍然存在一些开放的问题。在本文中,我们提出了一种低成本的映射方法来映射任何多核平台上的动态数据流程序。我们的方法在几毫秒内找到了有趣的映射解决方案,通过将其转换为等价的图分区问题,使其在常规时间内可行。因此,即使使用这种不可预测的应用程序,也可以在目标平台上保持良好的负载平衡。我们在三个MPEG视频解码器上进行了实验,其中一个基于新的高效视频编码标准。这些基于数据流的视频解码器在两个不同的平台上运行:一个是桌面多核处理器,一个是由相互连接的微型超长指令字(Very Long Instruction Word)式处理器组成的嵌入式平台。我们的整个设计流程都是基于开源工具。我们给出了处理器数量对性能的影响,并表明我们的方法获得了16个处理器的最大解码率。