2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing最新文献

Toward a Self-Aware Codelet Execution Model 迈向自我意识的代码执行模型

2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing Pub Date : 2014-08-24 DOI: 10.1109/DFM.2014.12

Stéphane Zuckerman, A. Landwehr, Kelly Livingston, G. Gao

{"title":"Toward a Self-Aware Codelet Execution Model","authors":"Stéphane Zuckerman, A. Landwehr, Kelly Livingston, G. Gao","doi":"10.1109/DFM.2014.12","DOIUrl":"https://doi.org/10.1109/DFM.2014.12","url":null,"abstract":"Future extreme-scale supercomputers will feature arrays of general-purpose and specialized many-core processors, totaling thousands of cores on a single chip. In general, many-core chips will most likely resemble a \"hierarchical and distributed system on chip.\" It is expected that such systems will be hard to exploit not only for performance, but will also need to deal with reliability issues, as well as power and energy issues. The Codelet Model is a fine-grain dataflow-inspired and event-driven program execution model which was designed to run parallel programs on a combination of such many-core chips into a supercomputer. Meanwhile, some on-going work is attempting to take into account user goals as well as resource usage and make the system \"self-aware:\" By using introspective means, this kind of research tries to have the system software modify the state of the overall system at run-time to satisfy the user goals. It is very likely that future extreme-scale systems will be in constant demand of different kinds of resources, may they be processing elements (general purpose or otherwise), bandwidth, power budget, etc. This paper takes the position that a potential solution to solve the resource management issue at this scale is a hierarchical and distributed self-aware system leveraging the fine-grain event-driven codelet threading model.","PeriodicalId":183526,"journal":{"name":"2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122264412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

A Holistic Dataflow-Inspired System Design 一个整体数据流启发的系统设计

2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing Pub Date : 2014-08-24 DOI: 10.1109/DFM.2014.16

Stéphane Zuckerman, Haitao Wei, G. Gao, H. Wong, J. Gaudiot, A. Louri

{"title":"A Holistic Dataflow-Inspired System Design","authors":"Stéphane Zuckerman, Haitao Wei, G. Gao, H. Wong, J. Gaudiot, A. Louri","doi":"10.1109/DFM.2014.16","DOIUrl":"https://doi.org/10.1109/DFM.2014.16","url":null,"abstract":"Computer systems have undergone a fundamental transformation recently, from single-core processors to devices with increasingly higher core counts within a single chip. The semi-conductor industry now faces the infamous power and utilization walls. To meet these challenges, heterogeneity in design, both at the architecture and technology levels, will be the prevailing approach for energy efficient computing as specialized cores, accelerators, etc., can eliminate the energy overheads of general-purpose homogeneous cores. However, with future technological challenges pointing in the direction of on-chip heterogeneity, and because of the traditional difficulty of parallel programming, it becomes imperative to produce new system software stacks that can take advantage of the heterogeneous hardware. As a case in point, the core count per chip continues to increase dramatically while the available on-chip memory per core is only getting marginally bigger. Thus, data locality, already a must-have in high-performance computing, will become even more critical as memory technology progresses. In turn, this makes it crucial that new execution models be developed to better exploit the trends of future heterogeneous computing in many-core chips. To solve these issues, we propose a cross-cutting cross-layer approach to address the challenges posed by future heterogeneous many-core chips.","PeriodicalId":183526,"journal":{"name":"2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122309655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Limits of Statically-Scheduled Token Dataflow Processing 静态调度令牌数据流处理的限制

2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing Pub Date : 2014-08-24 DOI: 10.1109/DFM.2014.21

Nachiket Kapre, Siddhartha

{"title":"Limits of Statically-Scheduled Token Dataflow Processing","authors":"Nachiket Kapre, Siddhartha","doi":"10.1109/DFM.2014.21","DOIUrl":"https://doi.org/10.1109/DFM.2014.21","url":null,"abstract":"FPGA-based token dataflow processing has been shown to accelerate hard-to-parallelize problems exhibiting irregular dataflow parallelism by as much as an order of magnitude when compared to conventional compute organizations. However, when the structure of the dataflow computation is known upfront, either at compile time or at the start of execution, we can employ static scheduling techniques to further improve performance and enhance compute density of the dataflow hardware. In this paper, we identify the costs and performance trends of both static and dynamic scheduling approaches when considering hardware acceleration of SPICE device equations and Sparse LU factorization in circuit graphs. While the experiments are limited to a case study, the hardware design and dataflow compiler are general and can be extended to other problems and instances where dataflow computing may be applicable. With this study, we hope to develop a quantitative basis for the design of a hybrid dataflow architecture that combines both static and dynamic scheduling techniques. We observe a performance benefit of 2 - 4× and a resource utilization saving of 2 - 3× in favor of statically scheduled hardware.","PeriodicalId":183526,"journal":{"name":"2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126320308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Language Features for Scalable Distributed-Memory Dataflow Computing 可扩展分布式内存数据流计算的语言特性

2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing Pub Date : 2014-08-24 DOI: 10.1109/DFM.2014.17

J. Wozniak, M. Wilde, Ian T Foster

引用次数: 19

Comparing the StreamIt and SC Languages for Manycore Processors 多核处理器中StreamIt和SC语言的比较

2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing Pub Date : 2014-08-24 DOI: 10.1109/DFM.2014.13

XuanKhanh Do, Stéphane Louise, Albert Cohen

{"title":"Comparing the StreamIt and SC Languages for Manycore Processors","authors":"XuanKhanh Do, Stéphane Louise, Albert Cohen","doi":"10.1109/DFM.2014.13","DOIUrl":"https://doi.org/10.1109/DFM.2014.13","url":null,"abstract":"Embedded many-core systems offering thousands of cores should be available in the near future. Stream programming is a particular instance of data-flow programming where computations are expressed as the data-driven execution of repetitive \"filters\" on data streams. Stream programming fits these manycore systems' requirements in terms of parallelism, functional determinism, and local data reuse. Statically or semi-dynamically scheduled stream languages like e.g. StreamIt and ?C can generate very efficient parallel code, but have strict limitations with respect to the expression of dynamic computational tasks, context-dependent modes of operation, and dynamic memory management. This paper compares two state-of-the-art stream languages, StreamIt and ?C, with the aim of better understanding their strengths and weaknesses, and finding a way to improve them. We also propose an automatic conversion method and tool to transform between these two languages. This tool allows to port and evaluate the classical StreamIt benchmarks on Kalray's MPPA, a real-world many-core processor representative of tomorrow's embedded many-core chips. We conclude with propositions for the evolution of stream-programming models.","PeriodicalId":183526,"journal":{"name":"2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133745242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

On the Feasibility of a Codelet Based Multi-core Operating System 基于Codelet的多核操作系统的可行性研究

2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing Pub Date : 2014-08-24 DOI: 10.1109/DFM.2014.18

J. Dennis, G. Gao

引用次数: 1

DFGR an Intermediate Graph Representation for Macro-Dataflow Programs 宏数据流程序的中间图表示

2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing Pub Date : 2014-08-24 DOI: 10.1109/DFM.2014.9

A. Sbîrlea, L. Pouchet, Vivek Sarkar

引用次数: 8

Hierarchically Tiled Array as a High-Level Abstraction for Codelets 层次化平铺数组作为代码集的高级抽象

2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing Pub Date : 2014-08-24 DOI: 10.1109/DFM.2014.11

Chih-Chieh Yang, J. C. Pichel, Adam R. Smith, D. Padua

引用次数: 7

Asynchronous Task Scheduling of the Fast Multipole Method Using Various Runtime Systems 基于不同运行时系统的快速多极方法异步任务调度

2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing Pub Date : 2014-08-24 DOI: 10.1109/DFM.2014.14

Bo Zhang

引用次数: 13

A Clockless Computing System Based on the Static Dataflow Paradigm 基于静态数据流范式的无时钟计算系统

2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing Pub Date : 2014-08-24 DOI: 10.1109/DFM.2014.10

L. Verdoscia, R. Vaccaro, R. Giorgi

{"title":"A Clockless Computing System Based on the Static Dataflow Paradigm","authors":"L. Verdoscia, R. Vaccaro, R. Giorgi","doi":"10.1109/DFM.2014.10","DOIUrl":"https://doi.org/10.1109/DFM.2014.10","url":null,"abstract":"The ambitious challenges posed by next exascale computing systems may require a critical re-examination of both architecture design and consolidated wisdom in terms of programming style and execution model, because such systems are expected to be constituted by thousands of processors with thousands of cores per chip. But how to build exascale architectures remains an open question.This paper presents a novel computing system based on a configurable architecture and a static dataflow execution model. We assume that the basic computational unit is constituted by a dataflow graph. Each processing node is constituted by an ad hoc kernel processor - designed to manage and schedule dataflow graphs, and a manycore dataflow execution engine - designed to execute such dataflow graphs.The main components of the dataflow execution engine are the Dataflow Actor Cores (DACs), which are small, identical and configurable. The major contributions of this paper are: i) the introduction of a machine language (named D#) which represents the low-level static configuration information of the system; ii) the introduction of a self-scheduled clockless mechanism to start operations on the presence of validity tokens only; iii) a design that avoids the need of temporary storage for tokens on the links of the DACs.Our preliminary tests on FPGA-based hardware show the feasibility of this approach.","PeriodicalId":183526,"journal":{"name":"2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130611967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20