Java异构计算的运行时代码生成和数据管理

Proceedings of the Principles and Practices of Programming on The Java Platform Pub Date : 2015-09-08 DOI:10.1145/2807426.2807428

J. Fumero, Toomas Remmelg, Michel Steuwer, Christophe Dubach

{"title":"Java异构计算的运行时代码生成和数据管理","authors":"J. Fumero, Toomas Remmelg, Michel Steuwer, Christophe Dubach","doi":"10.1145/2807426.2807428","DOIUrl":null,"url":null,"abstract":"GPUs (Graphics Processing Unit) and other accelerators are nowadays commonly found in desktop machines, mobile devices and even data centres. While these highly parallel processors offer high raw performance, they also dramatically increase program complexity, requiring extra effort from programmers. This results in difficult-to-maintain and non-portable code due to the low-level nature of the languages used to program these devices. This paper presents a high-level parallel programming approach for the popular Java programming language. Our goal is to revitalise the old Java slogan -- Write once, run anywhere --- in the context of modern heterogeneous systems. To enable the use of parallel accelerators from Java we introduce a new API for heterogeneous programming based on array and functional programming. Applications written with our API can then be transparently accelerated on a device such as a GPU using our runtime OpenCL code generator. In order to ensure the highest level of performance, we present data management optimizations. Usually, data has to be translated (marshalled) between the Java representation and the representation accelerators use. This paper shows how marshal affects runtime and present a novel technique in Java to avoid this cost by implementing our own customised array data structure. Our design hides low level data management from the user making our approach applicable even for inexperienced Java programmers. We evaluated our technique using a set of applications from different domains, including mathematical finance and machine learning. We achieve speedups of up to 500× over sequential and multi-threaded Java code when using an external GPU.","PeriodicalId":104024,"journal":{"name":"Proceedings of the Principles and Practices of Programming on The Java Platform","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Runtime Code Generation and Data Management for Heterogeneous Computing in Java\",\"authors\":\"J. Fumero, Toomas Remmelg, Michel Steuwer, Christophe Dubach\",\"doi\":\"10.1145/2807426.2807428\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"GPUs (Graphics Processing Unit) and other accelerators are nowadays commonly found in desktop machines, mobile devices and even data centres. While these highly parallel processors offer high raw performance, they also dramatically increase program complexity, requiring extra effort from programmers. This results in difficult-to-maintain and non-portable code due to the low-level nature of the languages used to program these devices. This paper presents a high-level parallel programming approach for the popular Java programming language. Our goal is to revitalise the old Java slogan -- Write once, run anywhere --- in the context of modern heterogeneous systems. To enable the use of parallel accelerators from Java we introduce a new API for heterogeneous programming based on array and functional programming. Applications written with our API can then be transparently accelerated on a device such as a GPU using our runtime OpenCL code generator. In order to ensure the highest level of performance, we present data management optimizations. Usually, data has to be translated (marshalled) between the Java representation and the representation accelerators use. This paper shows how marshal affects runtime and present a novel technique in Java to avoid this cost by implementing our own customised array data structure. Our design hides low level data management from the user making our approach applicable even for inexperienced Java programmers. We evaluated our technique using a set of applications from different domains, including mathematical finance and machine learning. We achieve speedups of up to 500× over sequential and multi-threaded Java code when using an external GPU.\",\"PeriodicalId\":104024,\"journal\":{\"name\":\"Proceedings of the Principles and Practices of Programming on The Java Platform\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Principles and Practices of Programming on The Java Platform\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2807426.2807428\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Principles and Practices of Programming on The Java Platform","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2807426.2807428","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

摘要

gpu(图形处理单元)和其他加速器现在在台式机、移动设备甚至数据中心中都很常见。虽然这些高度并行的处理器提供了很高的原始性能，但它们也极大地增加了程序的复杂性，需要程序员付出额外的努力。这导致了难以维护和不可移植的代码，因为用于编程这些设备的语言是低级的。本文针对流行的Java编程语言提出了一种高级并行编程方法。我们的目标是在现代异构系统的环境中重振古老的Java口号——一次编写，随处运行。为了能够使用Java中的并行加速器，我们引入了一个新的API，用于基于数组和函数式编程的异构编程。使用我们的API编写的应用程序可以使用我们的运行时OpenCL代码生成器在GPU等设备上透明地加速。为了确保最高水平的性能，我们提供了数据管理优化。通常，数据必须在Java表示和加速器使用的表示之间进行转换(编组)。本文展示了marshal如何影响运行时，并在Java中提出了一种新技术，通过实现我们自己定制的数组数据结构来避免这种成本。我们的设计对用户隐藏了底层的数据管理，使我们的方法甚至适用于没有经验的Java程序员。我们使用一组来自不同领域的应用程序来评估我们的技术，包括数学金融和机器学习。当使用外部GPU时，我们实现了比顺序和多线程Java代码高达500倍的速度提升。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Runtime Code Generation and Data Management for Heterogeneous Computing in Java

GPUs (Graphics Processing Unit) and other accelerators are nowadays commonly found in desktop machines, mobile devices and even data centres. While these highly parallel processors offer high raw performance, they also dramatically increase program complexity, requiring extra effort from programmers. This results in difficult-to-maintain and non-portable code due to the low-level nature of the languages used to program these devices. This paper presents a high-level parallel programming approach for the popular Java programming language. Our goal is to revitalise the old Java slogan -- Write once, run anywhere --- in the context of modern heterogeneous systems. To enable the use of parallel accelerators from Java we introduce a new API for heterogeneous programming based on array and functional programming. Applications written with our API can then be transparently accelerated on a device such as a GPU using our runtime OpenCL code generator. In order to ensure the highest level of performance, we present data management optimizations. Usually, data has to be translated (marshalled) between the Java representation and the representation accelerators use. This paper shows how marshal affects runtime and present a novel technique in Java to avoid this cost by implementing our own customised array data structure. Our design hides low level data management from the user making our approach applicable even for inexperienced Java programmers. We evaluated our technique using a set of applications from different domains, including mathematical finance and machine learning. We achieve speedups of up to 500× over sequential and multi-threaded Java code when using an external GPU.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Principles and Practices of Programming on The Java Platform

自引率

0.00%

发文量