为我们的未来寻求统一的全局视图并行编程模型

Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers Pub Date : 2016-06-01 DOI:10.1145/2931088.2931089

K. Taura

{"title":"为我们的未来寻求统一的全局视图并行编程模型","authors":"K. Taura","doi":"10.1145/2931088.2931089","DOIUrl":null,"url":null,"abstract":"Developing highly scalable programs on today's HPC machines is becoming ever more challenging, due to decreasing byte-flops ratio, deepening memory/network hierarchies, and heterogeneity. Programmers need to learn a distinct programming API for each layer of the hierarchy and overcome performance issues at all layers, one at a time, when the underlying high-level principle for performance is in fact fairly common across layers---locality. Future programming models must allow the programmer to express locality and parallelism in high level terms and their implementation should map exposed parallelism onto different layers of the machine (nodes, cores, and vector units) efficiently by concerted efforts of compilers and runtime systems. In this talk, I will argue that a global view task parallel programming model is a promising direction toward this goal that can reconcile generality, programmability, and performance at a high level. I will then talk about our ongoing research efforts with this prospect. They include: MassiveThreads, a lightweight user-level thread package for multicore systems; MassiveThreads/DM, its extension to distributed memory machines; DAGViz, a performance analyzer specifically designed for task parallel programs; and a task-vectorizing compiler that transforms task parallel programs into vectorized and parallelized instructions. I will end by sharing our prospects on how emerging hardware features and fruitful co-design efforts may help achieve the challenging goal.","PeriodicalId":262414,"journal":{"name":"Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Quest for Unified, Global View Parallel Programming Models for Our Future\",\"authors\":\"K. Taura\",\"doi\":\"10.1145/2931088.2931089\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Developing highly scalable programs on today's HPC machines is becoming ever more challenging, due to decreasing byte-flops ratio, deepening memory/network hierarchies, and heterogeneity. Programmers need to learn a distinct programming API for each layer of the hierarchy and overcome performance issues at all layers, one at a time, when the underlying high-level principle for performance is in fact fairly common across layers---locality. Future programming models must allow the programmer to express locality and parallelism in high level terms and their implementation should map exposed parallelism onto different layers of the machine (nodes, cores, and vector units) efficiently by concerted efforts of compilers and runtime systems. In this talk, I will argue that a global view task parallel programming model is a promising direction toward this goal that can reconcile generality, programmability, and performance at a high level. I will then talk about our ongoing research efforts with this prospect. They include: MassiveThreads, a lightweight user-level thread package for multicore systems; MassiveThreads/DM, its extension to distributed memory machines; DAGViz, a performance analyzer specifically designed for task parallel programs; and a task-vectorizing compiler that transforms task parallel programs into vectorized and parallelized instructions. I will end by sharing our prospects on how emerging hardware features and fruitful co-design efforts may help achieve the challenging goal.\",\"PeriodicalId\":262414,\"journal\":{\"name\":\"Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2931088.2931089\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2931088.2931089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在今天的高性能计算机器上开发高度可扩展的程序变得越来越具有挑战性，因为字节失败率降低了，内存/网络层次结构加深了，异构性也增加了。程序员需要为层次结构的每一层学习不同的编程API，并克服所有层的性能问题，一次一个，而实际上，底层的高级性能原则是跨层相当普遍的——局部性。未来的编程模型必须允许程序员用高级术语表达局部性和并行性，并且它们的实现应该通过编译器和运行时系统的协同努力，有效地将暴露的并行性映射到机器的不同层(节点、核心和向量单元)。在这次演讲中，我将论证全局视图任务并行编程模型是实现这一目标的一个有希望的方向，它可以在高层次上协调通用性、可编程性和性能。然后，我将谈谈我们正在进行的关于这一前景的研究工作。它们包括:MassiveThreads，一个用于多核系统的轻量级用户级线程包;MassiveThreads/DM，它对分布式内存机的扩展;专门为任务并行程序设计的性能分析器DAGViz;任务向量化编译器将任务并行程序转换为向量化和并行化指令。最后，我将分享我们对新兴硬件特性和富有成效的协同设计工作如何帮助实现具有挑战性的目标的展望。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Quest for Unified, Global View Parallel Programming Models for Our Future

Developing highly scalable programs on today's HPC machines is becoming ever more challenging, due to decreasing byte-flops ratio, deepening memory/network hierarchies, and heterogeneity. Programmers need to learn a distinct programming API for each layer of the hierarchy and overcome performance issues at all layers, one at a time, when the underlying high-level principle for performance is in fact fairly common across layers---locality. Future programming models must allow the programmer to express locality and parallelism in high level terms and their implementation should map exposed parallelism onto different layers of the machine (nodes, cores, and vector units) efficiently by concerted efforts of compilers and runtime systems. In this talk, I will argue that a global view task parallel programming model is a promising direction toward this goal that can reconcile generality, programmability, and performance at a high level. I will then talk about our ongoing research efforts with this prospect. They include: MassiveThreads, a lightweight user-level thread package for multicore systems; MassiveThreads/DM, its extension to distributed memory machines; DAGViz, a performance analyzer specifically designed for task parallel programs; and a task-vectorizing compiler that transforms task parallel programs into vectorized and parallelized instructions. I will end by sharing our prospects on how emerging hardware features and fruitful co-design efforts may help achieve the challenging goal.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers

自引率

0.00%

发文量