ANDARE '17最新文献

筛选
英文 中文
Auto-tuning Static Schedules for Task Data-flow Applications 任务数据流应用程序的自动调优静态计划
ANDARE '17 Pub Date : 2017-09-09 DOI: 10.1145/3152821.3152879
Andreas Diavastos, P. Trancoso
{"title":"Auto-tuning Static Schedules for Task Data-flow Applications","authors":"Andreas Diavastos, P. Trancoso","doi":"10.1145/3152821.3152879","DOIUrl":"https://doi.org/10.1145/3152821.3152879","url":null,"abstract":"Scheduling task-based parallel applications on many-core processors is becoming more challenging and has received lots of attention recently. The main challenge is to efficiently map the tasks to the underlying hardware topology using application characteristics such as the dependences between tasks, in order to satisfy the requirements. To achieve this, each application must be studied exhaustively as to define the usage of the data by the different tasks, that would provide the knowledge for mapping tasks that share the same data close to each other. In addition, different hardware topologies will require different mappings for the same application to produce the best performance.\u0000 In this work we use the synchronization graph of a task-based parallel application that is produced during compilation and try to automatically tune the scheduling policy on top of any underlying hardware using heuristic-based Genetic Algorithm techniques. This tool is integrated into an actual task-based parallel programming platform called SWITCHES and is evaluated using real applications from the SWITCHES benchmark suite. We compare our results with the execution time of predefined schedules within SWITCHES and observe that the tool can converge close to an optimal solution with no effort from the user and using fewer resources.","PeriodicalId":227417,"journal":{"name":"ANDARE '17","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129992965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Autotuning of OpenCL Kernels with Global Optimizations 使用全局优化的OpenCL内核的自动调优
ANDARE '17 Pub Date : 2017-09-09 DOI: 10.1145/3152821.3152877
J. Filipovič, Filip Petrovic, S. Benkner
{"title":"Autotuning of OpenCL Kernels with Global Optimizations","authors":"J. Filipovič, Filip Petrovic, S. Benkner","doi":"10.1145/3152821.3152877","DOIUrl":"https://doi.org/10.1145/3152821.3152877","url":null,"abstract":"Autotuning is an important method for automatically exploring code optimizations. It may target low-level code optimizations, such as memory blocking, loop unrolling or memory prefetching, as well as high-level optimizations, such as placement of computation kernels on proper hardware devices, optimizing memory transfers between nodes or between accelerators and main memory.\u0000 In this paper, we introduce an autotuning method, which extends state-of-the-art low-level tuning of OpenCL or CUDA kernels towards more complex optimizations. More precisely, we introduce a Kernel Tuning Toolkit (KTT), which implements inter-kernel global optimizations, allowing to tune parameters affecting multiple kernels or also the host code. We demonstrate on practical examples, that with global kernel optimizations we are able to explore tuning options that are not possible if kernels are tuned separately. Moreover, our tuning strategies can take into account numerical accuracy across multiple kernel invocations and search for implementations within specific numerical error bounds.","PeriodicalId":227417,"journal":{"name":"ANDARE '17","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127078429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Adaptive Performance Sensitivity Model to Support GPU Power Management 支持GPU电源管理的自适应性能灵敏度模型
ANDARE '17 Pub Date : 2017-09-09 DOI: 10.1145/3152821.3152822
Francesco Paterna, U. Gupta, R. Ayoub, Ümit Y. Ogras, M. Kishinevsky
{"title":"Adaptive Performance Sensitivity Model to Support GPU Power Management","authors":"Francesco Paterna, U. Gupta, R. Ayoub, Ümit Y. Ogras, M. Kishinevsky","doi":"10.1145/3152821.3152822","DOIUrl":"https://doi.org/10.1145/3152821.3152822","url":null,"abstract":"Integrated graphics units consume a large portion of power in client and mobile systems. Pro-active power management algorithms have been devised to meet expected user experience while reducing energy consumption. These techniques often rely on power and performance sensitivity models that are constructed at design phase using a number of workloads. Despite this, the lack of representative workloads and model identification overhead adversely impact accuracy and development time, respectively. Conversely, two main challenges limit runtime post-design identification: the absence of sensitivity feedback from the system and the limited computational resources. We propose a two-stage methodology that first identifies the features of the sensitivity model offline by leveraging a reduced amount of training data and then uses recursive least square algorithm to fit and adapt the coefficients of the model to workload changes at runtime. The proposed adaptive approach can reduce offline training data by 50% with respect to full offline model identification while maintaining accuracy as much as 95% on average.","PeriodicalId":227417,"journal":{"name":"ANDARE '17","volume":"302 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114950826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Benefits in Relaxing the Power Capping Constraint 放宽功率封顶约束的好处
ANDARE '17 Pub Date : 2017-09-09 DOI: 10.1145/3152821.3152878
Daniel Cesarini, Andrea Bartolini, L. Benini
{"title":"Benefits in Relaxing the Power Capping Constraint","authors":"Daniel Cesarini, Andrea Bartolini, L. Benini","doi":"10.1145/3152821.3152878","DOIUrl":"https://doi.org/10.1145/3152821.3152878","url":null,"abstract":"In this manuscript we evaluate the impact of HW power capping mechanisms on a real scientific application composed by parallel execution. By comparing HW capping mechanism against static frequency allocation schemes we show that a speed up can be achieved if the power constraint is enforced in average, during the application run, instead of on short time periods. RAPL, which enforces the power constraint on a few ms time scale, fails on sharing power budget between more demanding and less demanding application phases.","PeriodicalId":227417,"journal":{"name":"ANDARE '17","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127597628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Exploiting Parallelism on GPUs and FPGAs with OmpSs 利用compps开发gpu和fpga的并行性
ANDARE '17 Pub Date : 2017-09-09 DOI: 10.1145/3152821.3152880
Jaume Bosch, Antonio Filgueras, Miquel Vidal Piñol, Daniel Jiménez-González, C. Álvarez, X. Martorell
{"title":"Exploiting Parallelism on GPUs and FPGAs with OmpSs","authors":"Jaume Bosch, Antonio Filgueras, Miquel Vidal Piñol, Daniel Jiménez-González, C. Álvarez, X. Martorell","doi":"10.1145/3152821.3152880","DOIUrl":"https://doi.org/10.1145/3152821.3152880","url":null,"abstract":"This paper presents the OmpSs approach to deal with heterogeneous programming on GPU and FPGA accelerators. The OmpSs programming model is based on the Mercurium compiler and the Nanos++ runtime. Applications are annotated with compiler directives specifying task-based parallelism. The Mercurium compiler transforms the code to exploit the parallelism in the SMP host cores, and also to spawn work on CUDA/OpenCL devices, and FPGA accelerators. For the CUDA/OpenCL devices, the programmer needs only to insert the annotations and provide the kernel function to be compiled by the native CUDA/OpenCL compiler. In the case of the FPGAs, OmpSs uses the High-Level Synthesis tools from FPGA vendors to generate the IP configurations for the FPGA. In this paper we present the performance obtained on the matrix multiply benchmark in the Xilinx Zynq Ultrascale+, as a result of using OmpSs on this benchmark.","PeriodicalId":227417,"journal":{"name":"ANDARE '17","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117223110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信