{"title":"为我们的未来寻求统一的全局视图并行编程模型","authors":"K. Taura","doi":"10.1145/2931088.2931089","DOIUrl":null,"url":null,"abstract":"Developing highly scalable programs on today's HPC machines is becoming ever more challenging, due to decreasing byte-flops ratio, deepening memory/network hierarchies, and heterogeneity. Programmers need to learn a distinct programming API for each layer of the hierarchy and overcome performance issues at all layers, one at a time, when the underlying high-level principle for performance is in fact fairly common across layers---locality. Future programming models must allow the programmer to express locality and parallelism in high level terms and their implementation should map exposed parallelism onto different layers of the machine (nodes, cores, and vector units) efficiently by concerted efforts of compilers and runtime systems. In this talk, I will argue that a global view task parallel programming model is a promising direction toward this goal that can reconcile generality, programmability, and performance at a high level. I will then talk about our ongoing research efforts with this prospect. They include: MassiveThreads, a lightweight user-level thread package for multicore systems; MassiveThreads/DM, its extension to distributed memory machines; DAGViz, a performance analyzer specifically designed for task parallel programs; and a task-vectorizing compiler that transforms task parallel programs into vectorized and parallelized instructions. I will end by sharing our prospects on how emerging hardware features and fruitful co-design efforts may help achieve the challenging goal.","PeriodicalId":262414,"journal":{"name":"Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Quest for Unified, Global View Parallel Programming Models for Our Future\",\"authors\":\"K. Taura\",\"doi\":\"10.1145/2931088.2931089\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Developing highly scalable programs on today's HPC machines is becoming ever more challenging, due to decreasing byte-flops ratio, deepening memory/network hierarchies, and heterogeneity. Programmers need to learn a distinct programming API for each layer of the hierarchy and overcome performance issues at all layers, one at a time, when the underlying high-level principle for performance is in fact fairly common across layers---locality. Future programming models must allow the programmer to express locality and parallelism in high level terms and their implementation should map exposed parallelism onto different layers of the machine (nodes, cores, and vector units) efficiently by concerted efforts of compilers and runtime systems. In this talk, I will argue that a global view task parallel programming model is a promising direction toward this goal that can reconcile generality, programmability, and performance at a high level. I will then talk about our ongoing research efforts with this prospect. They include: MassiveThreads, a lightweight user-level thread package for multicore systems; MassiveThreads/DM, its extension to distributed memory machines; DAGViz, a performance analyzer specifically designed for task parallel programs; and a task-vectorizing compiler that transforms task parallel programs into vectorized and parallelized instructions. I will end by sharing our prospects on how emerging hardware features and fruitful co-design efforts may help achieve the challenging goal.\",\"PeriodicalId\":262414,\"journal\":{\"name\":\"Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2931088.2931089\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2931088.2931089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Quest for Unified, Global View Parallel Programming Models for Our Future
Developing highly scalable programs on today's HPC machines is becoming ever more challenging, due to decreasing byte-flops ratio, deepening memory/network hierarchies, and heterogeneity. Programmers need to learn a distinct programming API for each layer of the hierarchy and overcome performance issues at all layers, one at a time, when the underlying high-level principle for performance is in fact fairly common across layers---locality. Future programming models must allow the programmer to express locality and parallelism in high level terms and their implementation should map exposed parallelism onto different layers of the machine (nodes, cores, and vector units) efficiently by concerted efforts of compilers and runtime systems. In this talk, I will argue that a global view task parallel programming model is a promising direction toward this goal that can reconcile generality, programmability, and performance at a high level. I will then talk about our ongoing research efforts with this prospect. They include: MassiveThreads, a lightweight user-level thread package for multicore systems; MassiveThreads/DM, its extension to distributed memory machines; DAGViz, a performance analyzer specifically designed for task parallel programs; and a task-vectorizing compiler that transforms task parallel programs into vectorized and parallelized instructions. I will end by sharing our prospects on how emerging hardware features and fruitful co-design efforts may help achieve the challenging goal.