{"title":"多设备上的多任务(MTMD):利用异构托管运行时中的并发性","authors":"Michail Papadimitriou, Eleni Markou, J. Fumero, Athanasios Stratikopoulos, Florin Blanaru, Christos Kotselidis","doi":"10.1145/3453933.3454019","DOIUrl":null,"url":null,"abstract":"Modern commodity devices are nowadays equipped with a plethora of heterogeneous devices serving different purposes. Being able to exploit such heterogeneous hardware accelerators to their full potential is of paramount importance in the pursuit of higher performance and energy efficiency. Towards these objectives, the reduction of idle time of each device as well as the concurrent program execution across different accelerators can lead to better scalability within the computing platform. In this work, we propose a novel approach for enabling a Java-based heterogeneous managed runtime to automatically and efficiently deploy multiple tasks on multiple devices. We extend TornadoVM with parallel execution of bytecode interpreters to dynamically and concurrently manage and execute arbitrary tasks across multiple OpenCL-compatible devices. In addition, in order to achieve an efficient device-task allocation, we employ a machine learning approach with a multiple-classification architecture of Extra-Trees-Classifiers. Our proposed solution has been evaluated over a suite of 12 applications split into three different groups. Our experimental results showcase performance improvements up 83% compared to all tasks running on the single best device, while reaching up to 91% of the oracle performance.","PeriodicalId":322034,"journal":{"name":"Proceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Multiple-tasks on multiple-devices (MTMD): exploiting concurrency in heterogeneous managed runtimes\",\"authors\":\"Michail Papadimitriou, Eleni Markou, J. Fumero, Athanasios Stratikopoulos, Florin Blanaru, Christos Kotselidis\",\"doi\":\"10.1145/3453933.3454019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern commodity devices are nowadays equipped with a plethora of heterogeneous devices serving different purposes. Being able to exploit such heterogeneous hardware accelerators to their full potential is of paramount importance in the pursuit of higher performance and energy efficiency. Towards these objectives, the reduction of idle time of each device as well as the concurrent program execution across different accelerators can lead to better scalability within the computing platform. In this work, we propose a novel approach for enabling a Java-based heterogeneous managed runtime to automatically and efficiently deploy multiple tasks on multiple devices. We extend TornadoVM with parallel execution of bytecode interpreters to dynamically and concurrently manage and execute arbitrary tasks across multiple OpenCL-compatible devices. In addition, in order to achieve an efficient device-task allocation, we employ a machine learning approach with a multiple-classification architecture of Extra-Trees-Classifiers. Our proposed solution has been evaluated over a suite of 12 applications split into three different groups. Our experimental results showcase performance improvements up 83% compared to all tasks running on the single best device, while reaching up to 91% of the oracle performance.\",\"PeriodicalId\":322034,\"journal\":{\"name\":\"Proceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3453933.3454019\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3453933.3454019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
摘要
现代商品设备如今配备了大量的异构设备,服务于不同的目的。在追求更高的性能和能源效率方面,能够充分利用这种异构硬件加速器的潜力是至关重要的。为了实现这些目标,减少每个设备的空闲时间以及跨不同加速器的并发程序执行可以在计算平台内获得更好的可伸缩性。在这项工作中,我们提出了一种新的方法,使基于java的异构管理运行时能够在多个设备上自动有效地部署多个任务。我们通过并行执行字节码解释器来扩展TornadoVM,以便在多个兼容opencl的设备上动态并发地管理和执行任意任务。此外,为了实现高效的设备任务分配,我们采用了带有extra - tree - classifiers的多分类架构的机器学习方法。我们提出的解决方案已经在12个应用程序的套件中进行了评估,这些应用程序分为三个不同的组。我们的实验结果显示,与在单个最佳设备上运行的所有任务相比,性能提高了83%,同时达到了oracle性能的91%。
Multiple-tasks on multiple-devices (MTMD): exploiting concurrency in heterogeneous managed runtimes
Modern commodity devices are nowadays equipped with a plethora of heterogeneous devices serving different purposes. Being able to exploit such heterogeneous hardware accelerators to their full potential is of paramount importance in the pursuit of higher performance and energy efficiency. Towards these objectives, the reduction of idle time of each device as well as the concurrent program execution across different accelerators can lead to better scalability within the computing platform. In this work, we propose a novel approach for enabling a Java-based heterogeneous managed runtime to automatically and efficiently deploy multiple tasks on multiple devices. We extend TornadoVM with parallel execution of bytecode interpreters to dynamically and concurrently manage and execute arbitrary tasks across multiple OpenCL-compatible devices. In addition, in order to achieve an efficient device-task allocation, we employ a machine learning approach with a multiple-classification architecture of Extra-Trees-Classifiers. Our proposed solution has been evaluated over a suite of 12 applications split into three different groups. Our experimental results showcase performance improvements up 83% compared to all tasks running on the single best device, while reaching up to 91% of the oracle performance.