Proceedings of the 7th ACM international conference on Computing frontiers最新文献_第4页

Novel low-cost aging sensor 新型低成本老化传感器

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787299

M. Omaña, Daniele Rossi, N. Bosio, C. Metra

引用次数: 15

A heterogeneous parallel system running open mpi on a broadband network of embedded set-top devices 在嵌入式机顶机宽带网络上运行开放mpi的异构并行系统

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787324

Richard Neill, Alexander Shabarshin, L. Carloni

{"title":"A heterogeneous parallel system running open mpi on a broadband network of embedded set-top devices","authors":"Richard Neill, Alexander Shabarshin, L. Carloni","doi":"10.1145/1787275.1787324","DOIUrl":"https://doi.org/10.1145/1787275.1787324","url":null,"abstract":"We present a heterogeneous parallel computing system that combines a traditional computer cluster with a broadband network of embedded set-top box (STB) devices. As multiple service operators (MSO) manage millions of these devices across wide geographic areas, the computational power of such a massively-distributed embedded system could be harnessed to realize a centrally-managed, energy-efficient parallel processing platform that supports a variety of application domains which are of interest to MSOs, consumers, and the high-performance computing research community. We investigate the feasibility of this idea by building a prototype system that includes a complete head-end cable system with a DOCSIS-2.0 network combined with an interoperable implementation of a subset of Open MPI running on the STB embedded operating system. We evaluate the performance and scalability of our system compared to a traditional cluster by solving approximately various instances of the Multiple Sequence Alignment bioinformatics problem, while the STBs continue simultaneously to operate their primary functions: decode MPEG streams for television display and run an interactive user interface. Based on our experimental results and given the technology trends in embedded computing we argue that our approach to leverage a broadband network of embedded devices in a heterogeneous distributed system offers the benefits of both parallel computing clusters and distributed Internet computing.","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114398654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Combining deblurring and denoising for handheld HDR imaging in low light conditions 结合弱光条件下手持式HDR成像的去模糊和去噪

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787303

P. Lakshman

引用次数: 3

Session details: Caches and branches 2 会话细节:缓存和分支

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/3251913

C. Trinitis

引用次数: 0

Session details: Keynote 会议详情:

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/3251918

G. Bilardi

引用次数: 0

On-chip communication and synchronization mechanisms with cache-integrated network interfaces 片上通信和同步机制与缓存集成的网络接口

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787328

S. Kavadias, M. Katevenis, M. Zampetakis, Dimitrios S. Nikolopoulos

{"title":"On-chip communication and synchronization mechanisms with cache-integrated network interfaces","authors":"S. Kavadias, M. Katevenis, M. Zampetakis, Dimitrios S. Nikolopoulos","doi":"10.1145/1787275.1787328","DOIUrl":"https://doi.org/10.1145/1787275.1787328","url":null,"abstract":"Per-core local (scratchpad) memories allow direct inter-core communication, with latency and energy advantages over coherent cache-based communication, especially as CMP architectures become more distributed. We have designed cache-integrated network interfaces (NIs), appropriate for scalable multicores, that combine the best of two worlds the flexibility of caches and the efficiency of scratchpad memories: on-chip SRAM is configurably shared among caching, scratchpad, and virtualized NI functions. This paper presents our architecture, which provides local and remote scratchpad access, to either individual words or multi-word blocks through RDMA copy. Furthermore, we introduce event responses, as a mechanism for software configurable synchronization primitives. We present three event response mechanisms that expose NI functionality to software, for multiword transfer initiation, memory barriers for explicitly-selected accesses of arbitrary size, and multi-party synchronization queues. We implemented these mechanisms in a four-core FPGA prototype, and evaluated the on-chip communication performance on the prototype as well as on a CMP simulator with up to 128 cores. We demonstrate efficient synchronization, low-overhead communication, and amortized-overhead bulk transfers, which allow parallelization gains for fine-grain tasks, and efficient exploitation of the hardware bandwidth.","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"200 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134041697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

Interval-based models for run-time DVFS orchestration in superscalar processors 超标量处理器中运行时DVFS编排的基于间隔的模型

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787338

G. Keramidas, Vasileios Spiliopoulos, S. Kaxiras

{"title":"Interval-based models for run-time DVFS orchestration in superscalar processors","authors":"G. Keramidas, Vasileios Spiliopoulos, S. Kaxiras","doi":"10.1145/1787275.1787338","DOIUrl":"https://doi.org/10.1145/1787275.1787338","url":null,"abstract":"We develop two simple interval-based models for dynamic superscalar processors. These models allow us to: i) predict with great accuracy performance and power consumption under various frequency and voltage combinations and ii) implement targeted DVFS policies at run-time. The models analyze program execution in intervals - steady-state and miss-event intervals. Intervals are signalled by miss events (L2-misses in our case) that upset the \"steady state\" execution of the program and are ended when the pipeline reaches again a steady state. The first model is fed by an approximation of the stall cycles (the time the processor instruction window is blocked) due to long-latency L2-misses. The second model improves on this approximation using as input the occupancy of the L2's miss-handling registers (MSHRs). Despite their simplicity these models prove to be accurate in predicting the performance (and energy) for any target frequency/voltage setting, yielding average errors of 2.1% and 0.2% respectively. Besides modelling, we show that the methodology we propose is powerful enough to implement (at run-time) various DVFS policies: \"operate at optimal EDP\" or \"ED2P,\" or even \"reduce ED2P within specific performance constraints.\" Approaches based on the two models require minimal hardware cost: two counters for measuring the duration of the steady state and the miss-event intervals and some comparison logic. To validate our methodology we use a cycle-accurate simulator and the benchmarks provided by the SPEC2K suite. Our results indicate that our proposed run-time mechanism is able to orchestrate different DVFS policies with great success yielding negligible errors - bellow 1.5% on average.","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115444214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 93

Hybrid parallel programming with MPI and unified parallel C 基于MPI和统一并行C语言的混合并行编程

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787323

James Dinan, P. Balaji, E. Lusk, P. Sadayappan, R. Thakur

{"title":"Hybrid parallel programming with MPI and unified parallel C","authors":"James Dinan, P. Balaji, E. Lusk, P. Sadayappan, R. Thakur","doi":"10.1145/1787275.1787323","DOIUrl":"https://doi.org/10.1145/1787275.1787323","url":null,"abstract":"The Message Passing Interface (MPI) is one of the most widely used programming models for parallel computing. However, the amount of memory available to an MPI process is limited by the amount of local memory within a compute node. Partitioned Global Address Space (PGAS) models such as Unified Parallel C (UPC) are growing in popularity because of their ability to provide a shared global address space that spans the memories of multiple compute nodes. However, taking advantage of UPC can require a large recoding effort for existing parallel applications. In this paper, we explore a new hybrid parallel programming model that combines MPI and UPC. This model allows MPI programmers incremental access to a greater amount of memory, enabling memory-constrained MPI codes to process larger data sets. In addition, the hybrid model offers UPC programmers an opportunity to create static UPC groups that are connected over MPI. As we demonstrate, the use of such groups can significantly improve the scalability of locality-constrained UPC codes. This paper presents a detailed description of the hybrid model and demonstrates its effectiveness in two applications: a random access benchmark and the Barnes-Hut cosmological simulation. Experimental results indicate that the hybrid model can greatly enhance performance; using hybrid UPC groups that span two cluster nodes, RA performance increases by a factor of 1.33 and using groups that span four cluster nodes, Barnes-Hut experiences a twofold speedup at the expense of a 2% increase in code size.","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117074908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 56

Global management of cache hierarchies 缓存层次结构的全局管理

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787315

M. Zahran, S. Mckee

引用次数: 14

SpiNNaker: impact of traffic locality, causality and burstiness on the performance of the interconnection network 三角帆:业务局部性、因果性和突发性对互联网络性能的影响

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787278

J. Navaridas, L. Plana, J. Miguel-Alonso, M. Luján, S. Furber

{"title":"SpiNNaker: impact of traffic locality, causality and burstiness on the performance of the interconnection network","authors":"J. Navaridas, L. Plana, J. Miguel-Alonso, M. Luján, S. Furber","doi":"10.1145/1787275.1787278","DOIUrl":"https://doi.org/10.1145/1787275.1787278","url":null,"abstract":"The SpiNNaker system is a biologically-inspired massively parallel architecture of bespoke multi-core System-on-Chips. The aim of its design is to simulate up to a billion spiking neurons in (biological) real-time. Packets, in SpiNNaker, represent neural spikes and these travel through the two-dimensional triangular torus network that connects the over 65 thousand nodes housed in the largest size of SpiNNaker. The research question that we explore is the impact that spatial locality, temporal causality and burstiness of the traffic have on the performance of such interconnection network. Given the limited knowledge of neuron activity patterns, we propose and use synthetic traffic patterns which resemble biological neural traffic and allow tuning of spatial locality. Causality is explored by means of temporal patterns that maintain a specified overall network load while allowing at the node level autonomous causal traffic generation. Part of the traffic is generated automatically, but the remaining traffic is triggered by a spike arrival in the form of a packet or a burst of packets; as neural stimuli do. In this way, we generate non-uniform traffic patterns with an evolving concentration of activity at nodes which contain more active parts of the spiking neural network. Given the application domain, the simulation-based study focuses on the real-time behavior of the system rather than focusing on standard HPC network metrics. The results show that the interconnection network of SpiNNaker can operate without dropping packets with traffic loads that exceed more than 3.5 times those required to simulate 109 spiking neurons, despite using non-local traffic. We also find that increments in the degree of traffic causality do not affect the performance of the system, but burstiness in the traffic can hurt performance.","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129961487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15