2009 Eighth International Symposium on Parallel and Distributed Computing最新文献

筛选
英文 中文
Event-Driven Configuration of a Neural Network CMP System over a Homogeneous Interconnect Fabric 同构互连结构上神经网络CMP系统的事件驱动配置
2009 Eighth International Symposium on Parallel and Distributed Computing Pub Date : 2009-06-30 DOI: 10.1109/ISPDC.2009.25
Muhammad Mukaram Khan, J. Navaridas, Alexander D. Rast, Xin Jin, L. Plana, M. Luján, J. V. Woods, J. Miguel-Alonso, S. Furber
{"title":"Event-Driven Configuration of a Neural Network CMP System over a Homogeneous Interconnect Fabric","authors":"Muhammad Mukaram Khan, J. Navaridas, Alexander D. Rast, Xin Jin, L. Plana, M. Luján, J. V. Woods, J. Miguel-Alonso, S. Furber","doi":"10.1109/ISPDC.2009.25","DOIUrl":"https://doi.org/10.1109/ISPDC.2009.25","url":null,"abstract":"Configuring a million-core parallel system at boot time is a difficult process when the system has neither specialised hardware support for the configuration process nor a preconfigured default state that puts it in operating condition. SpiNNaker is a parallel Chip Multiprocessor (CMP) system for neural network (NN) simulation. Where most large CMP systems feature a sideband network to complete the boot process, SpiNNaker has a single homogeneous network interconnect for both application inter-processor communications and system control functions such as boot load and run-time user-system interaction. This network improves fault tolerance and makes it easier to support dynamic run-time reconfiguration, however, it requires a boot process that is transaction-level compatible with the application’s communications model. Since SpiNNaker uses event-driven asynchronous communications throughout, theloader operates with purely local control: there is no global synchronisation, state information, or transition sequence. A novel two-stage “unfolding” boot-up process efficiently configures the SpiNNaker hardware and loads the application using a high-speed flood-fill technique with support for run-time re-configuration. SystemC simulation of a multi-CMP SpiNNaker system indicates an error-free CMP configuration time of 1.3ms, while a high-level simulation of a full-scale system (64K CMPs) indicates a mean application-loading time of ∼20ms (for a 100KB application), which is virtually independent of the sizeof the system. We verified the CMP configuration process with hardware-level Verilog simulation.","PeriodicalId":226126,"journal":{"name":"2009 Eighth International Symposium on Parallel and Distributed Computing","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114174878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Realistic Evaluation of Interconnection Networks Using Synthetic Traffic 综合流量互联网络的现实评价
2009 Eighth International Symposium on Parallel and Distributed Computing Pub Date : 2009-06-30 DOI: 10.1109/ISPDC.2009.20
J. Navaridas, J. Miguel-Alonso
{"title":"Realistic Evaluation of Interconnection Networks Using Synthetic Traffic","authors":"J. Navaridas, J. Miguel-Alonso","doi":"10.1109/ISPDC.2009.20","DOIUrl":"https://doi.org/10.1109/ISPDC.2009.20","url":null,"abstract":"Evaluation of high performance parallel systems is a delicate issue, due to the difficulty of generating workloads that represent, those that will run on actual systems. We overview the most usual workloads for performance evaluation purposes, in the scope of interconnection networks simulation. Aiming to fill the gap between purely synthetic and application-driven workloads, we present a set of synthetic communication micro-kernels that enhance regular synthetic traffic by adding point-to-point causality. They are conceived to stress the interconnection architecture. As an example of the proposed method-ology, we use these micro-kernels to evaluate a topological improvement of k-ary n-cubes.","PeriodicalId":226126,"journal":{"name":"2009 Eighth International Symposium on Parallel and Distributed Computing","volume":"48 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114603851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Multi-hop Congestion Control Algorithm in Mobile Wireless Networks 移动无线网络中的多跳拥塞控制算法
2009 Eighth International Symposium on Parallel and Distributed Computing Pub Date : 2009-06-30 DOI: 10.1109/ISPDC.2009.30
S. Sahraei, D. Grigoras
{"title":"Multi-hop Congestion Control Algorithm in Mobile Wireless Networks","authors":"S. Sahraei, D. Grigoras","doi":"10.1109/ISPDC.2009.30","DOIUrl":"https://doi.org/10.1109/ISPDC.2009.30","url":null,"abstract":"This paper focuses on the congestion control problem on a multi-hop path in mobile wireless networks. Conventional congestion control algorithms do not consider highly mobile devices in a network where link status changes frequently. The assumption of link validity in proactive algorithms results in high network overhead. In this paper, a new on-demand method of congestion control in multi-hop communication within a wireless network is proposed. The basic idea is to monitor each node’s backpressure and identify the flooded links. Then it provides means for optimization of the congested link and regulation of the data rate at the source of network congestion. The main results are an increase of the overall network throughput and the possibility of co-existence for nodes with different rates. This algorithm is particularly important in urban settings characterized by a high density of mobile devices.","PeriodicalId":226126,"journal":{"name":"2009 Eighth International Symposium on Parallel and Distributed Computing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116486388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Understanding the Memory Behavior of Emerging Multi-core Workloads 理解新兴多核工作负载的内存行为
2009 Eighth International Symposium on Parallel and Distributed Computing Pub Date : 2009-06-30 DOI: 10.1109/ISPDC.2009.14
Junmin Lin, Yu Chen, Wenlong Li, A. Jaleel, Zhizhong Tang
{"title":"Understanding the Memory Behavior of Emerging Multi-core Workloads","authors":"Junmin Lin, Yu Chen, Wenlong Li, A. Jaleel, Zhizhong Tang","doi":"10.1109/ISPDC.2009.14","DOIUrl":"https://doi.org/10.1109/ISPDC.2009.14","url":null,"abstract":"This paper characterizes the memory behavior on emerging RMS (recognition, mining, and synthesis) workloads for future multi-core processors. As multi-core processors proliferate across different application domains, and the number of on-die cores continues to increase, a key issue facing processor architects is the design of the on-die last level cache (LLC). In this paper, we explore the LLC design space for multi-threaded RMS workloads by examining the working set sizes, data sharing behavior, and spatial data locality. Our study reveals that these RMS workloads are memory intensive, have large working-set sizes greater than 16MB on average, exhibit a significant amount of data sharing, about47% on average, and show strong strided streaming access behavior with 77% of accesses in regular pattern. Based on the observations, we then investigate the potential cache architecture choices for future multi-core design. Our experiments show that for these workloads (a) large DRAM caches can be useful to address their large working sets; E.g., a 128MB DRAM cache can reduce the average L1 miss penalty by 18%; (b) shared last level cache provides better cache performance than private cache; E.g., a 8MB shared cache provides 25% performance improvement over a private one with the same total size; and (c) stride based hardware prefetcher provides significant performance benefit by 25%. As a result, we suggest a memory hierarchy with a 128MB DRAM cache, a 8MB on-die SRAM shared cache and an 8-entry stride prefetcher to accommodate RMS workloads.","PeriodicalId":226126,"journal":{"name":"2009 Eighth International Symposium on Parallel and Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130731552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Utilizing Model Checking for Automated Optimization Information Discovery in InDiGO 利用模型检查实现InDiGO中的自动优化信息发现
2009 Eighth International Symposium on Parallel and Distributed Computing Pub Date : 2009-06-30 DOI: 10.1109/ISPDC.2009.22
Valeriy A. Kolesnikov, Gurdip Singh
{"title":"Utilizing Model Checking for Automated Optimization Information Discovery in InDiGO","authors":"Valeriy A. Kolesnikov, Gurdip Singh","doi":"10.1109/ISPDC.2009.22","DOIUrl":"https://doi.org/10.1109/ISPDC.2009.22","url":null,"abstract":"InDiGO framework provides an infrastructure which allows design of generic but customizable algorithms encapsulated as middleware services and provides tools to customize such algorithms for specific applications. Such customization allows one to optimize algorithms by removing communication which is redundant in the context of a specific application. Information necessary for optimization is derived by running queries of interest on the application abstraction. Each new query requires a new algorithm to be written that would operate on the application abstraction to give a yes or no answer. In this paper, we describe a different approach to answer the queries. It uses model checking and is fully automated. It also allows to answer the queries precisely as well as to verify more general properties. We present experimental results to demonstrate the optimizations when our infrastructure is utilized.","PeriodicalId":226126,"journal":{"name":"2009 Eighth International Symposium on Parallel and Distributed Computing","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128327999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Fully Distributed Clustering Algorithm Based on Random Walks 基于随机游动的全分布式聚类算法
2009 Eighth International Symposium on Parallel and Distributed Computing Pub Date : 2009-06-30 DOI: 10.1109/ISPDC.2009.21
A. Bui, Abdurusul Kudireti, D. Sohier
{"title":"A Fully Distributed Clustering Algorithm Based on Random Walks","authors":"A. Bui, Abdurusul Kudireti, D. Sohier","doi":"10.1109/ISPDC.2009.21","DOIUrl":"https://doi.org/10.1109/ISPDC.2009.21","url":null,"abstract":"In this paper, we present a fully distributed clustering algorithm based on random walks that works on arbitrary topologies. A cluster is composed of a set of nodes called the core that coordinates the clustering process, and of non-core nodes called ordinary nodes. A core is built through a random walk based procedure. Its neighboring nodes that do not belong to any cluster are recruited by the core as ordinary nodes into its cluster. The correctness and termination of our algorithm are proven. We also prove that when two clusters are adjacent, at least one of them has a complete core (i.e. a core with the maximum size allowed by the user). Our algorithm is not deterministic, which allows a better load balancing, since the core nodes are not determined by their ids and/or location.","PeriodicalId":226126,"journal":{"name":"2009 Eighth International Symposium on Parallel and Distributed Computing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129731463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Distributed Shared Memory for the Cell Broadband Engine (DSMCBE) 用于小区宽带引擎(DSMCBE)的分布式共享内存
2009 Eighth International Symposium on Parallel and Distributed Computing Pub Date : 2009-06-30 DOI: 10.1109/ISPDC.2009.40
Morten N. Larsen, K. Skovhede, B. Vinter
{"title":"Distributed Shared Memory for the Cell Broadband Engine (DSMCBE)","authors":"Morten N. Larsen, K. Skovhede, B. Vinter","doi":"10.1109/ISPDC.2009.40","DOIUrl":"https://doi.org/10.1109/ISPDC.2009.40","url":null,"abstract":"The CELL-BE processor provides high performance and has been shown to reach a performance close to the theoretical peak, however, the high performance comes at the price of a quite complex programming model. Central to the complexity of the CELL-BE programming model is the need to move data in and out of non-coherent local storage blocks for each special processor element. In this paper we present a software library, namely the Distributed Shared Memory for the Cell Broadband Engine (DSMCBE). By using techniques known from distributed shared memory DSMCBE allows programmers to program the CELL-BE with relative ease and in addition scale their applications to use multiple CELL-BE processors in a network. Performance experiments show that a quite high performance can be obtained with DSMCBE even in a cluster environment.","PeriodicalId":226126,"journal":{"name":"2009 Eighth International Symposium on Parallel and Distributed Computing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125275935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Automating Three-Dimensional Reconstruction of Icosahedral Virus Structure with Condensed Graphs 用缩合图自动重建二十面体病毒结构
2009 Eighth International Symposium on Parallel and Distributed Computing Pub Date : 2009-06-30 DOI: 10.1109/ISPDC.2009.9
Chenqi Wang, Neil Cafferkey, J. Morrison
{"title":"Automating Three-Dimensional Reconstruction of Icosahedral Virus Structure with Condensed Graphs","authors":"Chenqi Wang, Neil Cafferkey, J. Morrison","doi":"10.1109/ISPDC.2009.9","DOIUrl":"https://doi.org/10.1109/ISPDC.2009.9","url":null,"abstract":"We present a visual executable workflow for the three-dimensional reconstruction of icosahedral virus structure using the Traditional Model (TM) method. This workflow is implemented using WebCom, a metacomputer platform based on the Condensed Graph (CG) model. The CG model allows the application to be constructed with a structure that closely mirrors an abstract description of the workflow. By utilising WebCom's Integrated Development Environment, we also create a workbench environment that facilitates construction of related workflows through the reuse of components developed for the TM workflow. As an example of this component reuse, we outline the construction of a workflow for the alternative Unbiased Model reconstruction method.","PeriodicalId":226126,"journal":{"name":"2009 Eighth International Symposium on Parallel and Distributed Computing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125152874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Distributed Causal Model-Based Diagnosis Based on Interacting Behavioral Petri Nets 基于交互行为Petri网的分布式因果模型诊断
2009 Eighth International Symposium on Parallel and Distributed Computing Pub Date : 2009-06-30 DOI: 10.1109/ISPDC.2009.11
Hammadi Bennoui, A. Chaoui, Kamel Barkaoui
{"title":"Distributed Causal Model-Based Diagnosis Based on Interacting Behavioral Petri Nets","authors":"Hammadi Bennoui, A. Chaoui, Kamel Barkaoui","doi":"10.1109/ISPDC.2009.11","DOIUrl":"https://doi.org/10.1109/ISPDC.2009.11","url":null,"abstract":"This paper deals with the problem of causal model-based diagnosis of distributed systems. The setting we consider is a collection of interacting behavioral Petri nets (BPNs). Each BPN model represents the causal behavioral model of one subsystem and its interactions with neighboring subsystems. Interactions among subsystems are modeled by tokens that pass from one model to another via common places. Diagnosis reasoning scheme exploits, in a first step a backward reachability analysis on each net model to obtain local diagnoses; and in a second step, it exploits a forward reachability analysis for ensuring that local diagnoses are consistent and form global ones.","PeriodicalId":226126,"journal":{"name":"2009 Eighth International Symposium on Parallel and Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125477835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Program Execution Control in a Multi CMP Module System with a Look-Ahead Configured Global Network 具有前瞻性配置全局网络的多CMP模块系统中的程序执行控制
2009 Eighth International Symposium on Parallel and Distributed Computing Pub Date : 2009-06-30 DOI: 10.1109/ISPDC.2009.37
E. Laskowski, L. Masko, M. Tudruj, M. Thor
{"title":"Program Execution Control in a Multi CMP Module System with a Look-Ahead Configured Global Network","authors":"E. Laskowski, L. Masko, M. Tudruj, M. Thor","doi":"10.1109/ISPDC.2009.37","DOIUrl":"https://doi.org/10.1109/ISPDC.2009.37","url":null,"abstract":"The paper presents a method for the optimized control of program execution in modular systems based on Chip Multi Processor (CMP) modules interconnected by a special global inter-connection network. The applied CMP modules are based on communication on the fly, which is a novel efficient group communication paradigm implemented inside the interconnection network. Communication on the fly is based on a synergy of dynamic processor switching between memory modules and data read on the fly mechanism, which enables to many processors simultaneous reads of data, when present on shared memory buses. The paper presents a two-stage scheduling algorithm for programs expressed in a graph notation. The first stage schedules program tasks inside the CMP modules using an algorithm based on the notion of moldable tasks. In the result, a scheduled program moldable task graph is produced. The moldable task graph is next structurized for optimized communication execution in the global network working according to the look-ahead link connection setting paradigm. Results of simulation experiments evaluate the efficiency and other properties of the proposed architectural solution.","PeriodicalId":226126,"journal":{"name":"2009 Eighth International Symposium on Parallel and Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131393081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信