2009 Eighth International Symposium on Parallel and Distributed Computing最新文献_第2页

Event-Driven Configuration of a Neural Network CMP System over a Homogeneous Interconnect Fabric 同构互连结构上神经网络CMP系统的事件驱动配置

2009 Eighth International Symposium on Parallel and Distributed Computing Pub Date : 2009-06-30 DOI: 10.1109/ISPDC.2009.25

Muhammad Mukaram Khan, J. Navaridas, Alexander D. Rast, Xin Jin, L. Plana, M. Luján, J. V. Woods, J. Miguel-Alonso, S. Furber

{"title":"Event-Driven Configuration of a Neural Network CMP System over a Homogeneous Interconnect Fabric","authors":"Muhammad Mukaram Khan, J. Navaridas, Alexander D. Rast, Xin Jin, L. Plana, M. Luján, J. V. Woods, J. Miguel-Alonso, S. Furber","doi":"10.1109/ISPDC.2009.25","DOIUrl":"https://doi.org/10.1109/ISPDC.2009.25","url":null,"abstract":"Configuring a million-core parallel system at boot time is a difficult process when the system has neither specialised hardware support for the configuration process nor a preconfigured default state that puts it in operating condition. SpiNNaker is a parallel Chip Multiprocessor (CMP) system for neural network (NN) simulation. Where most large CMP systems feature a sideband network to complete the boot process, SpiNNaker has a single homogeneous network interconnect for both application inter-processor communications and system control functions such as boot load and run-time user-system interaction. This network improves fault tolerance and makes it easier to support dynamic run-time reconfiguration, however, it requires a boot process that is transaction-level compatible with the application’s communications model. Since SpiNNaker uses event-driven asynchronous communications throughout, theloader operates with purely local control: there is no global synchronisation, state information, or transition sequence. A novel two-stage “unfolding” boot-up process efficiently configures the SpiNNaker hardware and loads the application using a high-speed flood-fill technique with support for run-time re-configuration. SystemC simulation of a multi-CMP SpiNNaker system indicates an error-free CMP configuration time of 1.3ms, while a high-level simulation of a full-scale system (64K CMPs) indicates a mean application-loading time of ∼20ms (for a 100KB application), which is virtually independent of the sizeof the system. We verified the CMP configuration process with hardware-level Verilog simulation.","PeriodicalId":226126,"journal":{"name":"2009 Eighth International Symposium on Parallel and Distributed Computing","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114174878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Realistic Evaluation of Interconnection Networks Using Synthetic Traffic 综合流量互联网络的现实评价

2009 Eighth International Symposium on Parallel and Distributed Computing Pub Date : 2009-06-30 DOI: 10.1109/ISPDC.2009.20

J. Navaridas, J. Miguel-Alonso

引用次数: 7

Multi-hop Congestion Control Algorithm in Mobile Wireless Networks 移动无线网络中的多跳拥塞控制算法

2009 Eighth International Symposium on Parallel and Distributed Computing Pub Date : 2009-06-30 DOI: 10.1109/ISPDC.2009.30

S. Sahraei, D. Grigoras

引用次数: 1

Understanding the Memory Behavior of Emerging Multi-core Workloads 理解新兴多核工作负载的内存行为

2009 Eighth International Symposium on Parallel and Distributed Computing Pub Date : 2009-06-30 DOI: 10.1109/ISPDC.2009.14

Junmin Lin, Yu Chen, Wenlong Li, A. Jaleel, Zhizhong Tang

{"title":"Understanding the Memory Behavior of Emerging Multi-core Workloads","authors":"Junmin Lin, Yu Chen, Wenlong Li, A. Jaleel, Zhizhong Tang","doi":"10.1109/ISPDC.2009.14","DOIUrl":"https://doi.org/10.1109/ISPDC.2009.14","url":null,"abstract":"This paper characterizes the memory behavior on emerging RMS (recognition, mining, and synthesis) workloads for future multi-core processors. As multi-core processors proliferate across different application domains, and the number of on-die cores continues to increase, a key issue facing processor architects is the design of the on-die last level cache (LLC). In this paper, we explore the LLC design space for multi-threaded RMS workloads by examining the working set sizes, data sharing behavior, and spatial data locality. Our study reveals that these RMS workloads are memory intensive, have large working-set sizes greater than 16MB on average, exhibit a significant amount of data sharing, about47% on average, and show strong strided streaming access behavior with 77% of accesses in regular pattern. Based on the observations, we then investigate the potential cache architecture choices for future multi-core design. Our experiments show that for these workloads (a) large DRAM caches can be useful to address their large working sets; E.g., a 128MB DRAM cache can reduce the average L1 miss penalty by 18%; (b) shared last level cache provides better cache performance than private cache; E.g., a 8MB shared cache provides 25% performance improvement over a private one with the same total size; and (c) stride based hardware prefetcher provides significant performance benefit by 25%. As a result, we suggest a memory hierarchy with a 128MB DRAM cache, a 8MB on-die SRAM shared cache and an 8-entry stride prefetcher to accommodate RMS workloads.","PeriodicalId":226126,"journal":{"name":"2009 Eighth International Symposium on Parallel and Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130731552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Utilizing Model Checking for Automated Optimization Information Discovery in InDiGO 利用模型检查实现InDiGO中的自动优化信息发现

2009 Eighth International Symposium on Parallel and Distributed Computing Pub Date : 2009-06-30 DOI: 10.1109/ISPDC.2009.22

Valeriy A. Kolesnikov, Gurdip Singh

引用次数: 3

A Fully Distributed Clustering Algorithm Based on Random Walks 基于随机游动的全分布式聚类算法

2009 Eighth International Symposium on Parallel and Distributed Computing Pub Date : 2009-06-30 DOI: 10.1109/ISPDC.2009.21

A. Bui, Abdurusul Kudireti, D. Sohier

引用次数: 10

Distributed Shared Memory for the Cell Broadband Engine (DSMCBE) 用于小区宽带引擎(DSMCBE)的分布式共享内存

2009 Eighth International Symposium on Parallel and Distributed Computing Pub Date : 2009-06-30 DOI: 10.1109/ISPDC.2009.40

Morten N. Larsen, K. Skovhede, B. Vinter

引用次数: 2

Automating Three-Dimensional Reconstruction of Icosahedral Virus Structure with Condensed Graphs 用缩合图自动重建二十面体病毒结构

2009 Eighth International Symposium on Parallel and Distributed Computing Pub Date : 2009-06-30 DOI: 10.1109/ISPDC.2009.9

Chenqi Wang, Neil Cafferkey, J. Morrison

引用次数: 2

Distributed Causal Model-Based Diagnosis Based on Interacting Behavioral Petri Nets 基于交互行为Petri网的分布式因果模型诊断

2009 Eighth International Symposium on Parallel and Distributed Computing Pub Date : 2009-06-30 DOI: 10.1109/ISPDC.2009.11

Hammadi Bennoui, A. Chaoui, Kamel Barkaoui

引用次数: 9

Program Execution Control in a Multi CMP Module System with a Look-Ahead Configured Global Network 具有前瞻性配置全局网络的多CMP模块系统中的程序执行控制

2009 Eighth International Symposium on Parallel and Distributed Computing Pub Date : 2009-06-30 DOI: 10.1109/ISPDC.2009.37

E. Laskowski, L. Masko, M. Tudruj, M. Thor

{"title":"Program Execution Control in a Multi CMP Module System with a Look-Ahead Configured Global Network","authors":"E. Laskowski, L. Masko, M. Tudruj, M. Thor","doi":"10.1109/ISPDC.2009.37","DOIUrl":"https://doi.org/10.1109/ISPDC.2009.37","url":null,"abstract":"The paper presents a method for the optimized control of program execution in modular systems based on Chip Multi Processor (CMP) modules interconnected by a special global inter-connection network. The applied CMP modules are based on communication on the fly, which is a novel efficient group communication paradigm implemented inside the interconnection network. Communication on the fly is based on a synergy of dynamic processor switching between memory modules and data read on the fly mechanism, which enables to many processors simultaneous reads of data, when present on shared memory buses. The paper presents a two-stage scheduling algorithm for programs expressed in a graph notation. The first stage schedules program tasks inside the CMP modules using an algorithm based on the notion of moldable tasks. In the result, a scheduled program moldable task graph is produced. The moldable task graph is next structurized for optimized communication execution in the global network working according to the look-ahead link connection setting paradigm. Results of simulation experiments evaluate the efficiency and other properties of the proposed architectural solution.","PeriodicalId":226126,"journal":{"name":"2009 Eighth International Symposium on Parallel and Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131393081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1