Proceedings of the 7th ACM international conference on Computing frontiers最新文献_第7页

Proposition for a sequential accelerator in future general-purpose manycore processors and the problem of migration-induced cache misses 未来通用多核处理器中顺序加速器的构想及迁移导致的缓存缺失问题

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787330

P. Michaud, Yiannakis Sazeides, André Seznec

{"title":"Proposition for a sequential accelerator in future general-purpose manycore processors and the problem of migration-induced cache misses","authors":"P. Michaud, Yiannakis Sazeides, André Seznec","doi":"10.1145/1787275.1787330","DOIUrl":"https://doi.org/10.1145/1787275.1787330","url":null,"abstract":"As the number of transistors on a chip doubles with every technology generation, the number of on-chip cores also increases rapidly, making possible in a foreseeable future to design processors featuring hundreds of general-purpose cores. However, though a large number of cores speeds up parallel code sections, Amdahl's law requires speeding up sequential sections too. We argue that it will become possible to dedicate a substantial fraction of the chip area and power budget to achieve high sequential performance. Current general-purpose processors contain a handful of cores designed to be continuously active and run in parallel. This leads to power and thermal constraints that limit the core's performance. We propose removing these constraints with a sequential accelerator (SACC). A SACC consists of several cores designed for ultimate sequential performance. These cores cannot run continuously. A single core is active at any time, the rest of the cores are inactive and power-gated. We migrate the execution periodically to another core to spread heat generation uniformly over the whole SACC area, thus addressing the temperature issue. The SACC will be viable only if it yields significant sequential performance. Migration-induced cache misses may limit performance gains. We propose some solutions to mitigate this problem. We also investigate a migration method using thermal sensors, such that the migration interval depends on the ambient temperature and the migration penalty is negligible under normal thermal conditions.","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128722987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Session details: Power 1 会话细节:Power 1

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/3251915

M. Alderighi

引用次数: 0

Efficient parallel implementation of multilayer backpropagation networks on SpiNNaker SpiNNaker上多层反向传播网络的高效并行实现

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787297

Xin Jin, M. Luján, L. Plana, Alexander D. Rast, S. Welbourne, S. Furber

引用次数: 8

Session details: Keynote 会议详情:

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/3251909

N. Amato

引用次数: 0

ERBIUM: a deterministic, concurrent intermediate representation for portable and scalable performance ERBIUM:一种确定性的、并发的中间表示，用于实现可移植和可扩展的性能

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787312

Cupertino Miranda, Philippe Dumont, Albert Cohen, M. Duranton, Antoniu Pop

{"title":"ERBIUM: a deterministic, concurrent intermediate representation for portable and scalable performance","authors":"Cupertino Miranda, Philippe Dumont, Albert Cohen, M. Duranton, Antoniu Pop","doi":"10.1145/1787275.1787312","DOIUrl":"https://doi.org/10.1145/1787275.1787312","url":null,"abstract":"Optimizing compilers and runtime libraries do not shield programmers from the complexity of multi-core hardware; as a result the need for manual, target-specific optimizations increases with every processor generation. High-level languages are being designed to express concurrency and locality without reference to a particular architecture. But compiling such abstractions into efficient code requires a portable, intermediate representation: this is essential for modular composition (separate compilation), for optimization frameworks independent of the source language, and for just-in-time compilation of bytecode languages. This paper introduces Erbium, an intermediate representation for compilers, a low-level language for efficiency programmers, and a lightweight runtime implementation. It relies on a data structure for scalable and deterministic concurrency, called Event Recording, exposing the data-level, task and pipeline parallelism suitable to a given target. We provide experimental evidence of the productivity, scalability and efficiency advantages of Erbium, relying on a prototype implementation in GCC 4.3.","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132176245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Towards greener data centers with storage class memory: minimizing idle power waste through coarse-grain management in fine-grain scale 使用存储级内存实现更环保的数据中心:通过细粒度规模的粗粒度管理，最大限度地减少闲置电力浪费

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787340

In-Hwan Doh, Young Jin Kim, Eunsam Kim, Jongmoo Choi, Donghee Lee, S. Noh

{"title":"Towards greener data centers with storage class memory: minimizing idle power waste through coarse-grain management in fine-grain scale","authors":"In-Hwan Doh, Young Jin Kim, Eunsam Kim, Jongmoo Choi, Donghee Lee, S. Noh","doi":"10.1145/1787275.1787340","DOIUrl":"https://doi.org/10.1145/1787275.1787340","url":null,"abstract":"Studies have shown much of today's data centers are over-provisioned and underutilized. Over-provisioning cannot be avoided as these centers must anticipate peak load with bursty behavior. Under-utilization, to date, has also been unavoidable as systems always had to be ready for that sudden burst of requests that loom just around the corner. Previous research has pointed to turning off systems as one solution, albeit, an infeasible one due to its irresponsiveness. In this paper, we present the feasibility of using new Storage Class Memory (SCM, which encompasses specific developments such as PCM, MRAM, or FeRAM) technology to turn systems on and off with minimum overhead. This feature is used to control systems on the whole (in comparison to previous fine-grained component-wise control) in finer time scale for high responsiveness with minimized power lost to idleness. Our empirical study is done by executing \"real trace\"-like workloads on a prototype \"data center\" of embedded systems deploying FeRAM. We quantify the energy savings and performance trade-off by turning idle systems off. We show that our energy savings approach consumes energy in proportion to user requests with configurable service of quality. Based on observations made on this data center, we discuss the requirements for real deployment. Finally, our conclusion is that SCM should not be viewed as just a replacement of RAM, but rather, as a component that could potentially open a whole new field of applications.","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126463978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Multiple sleep modes leakage control in peripheral circuits of a all major SRAM-based processor units 基于sram的处理器外围电路的多睡眠模式泄漏控制

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787339

H. Homayoun, Avesta Sasan, Aseem Gupta, A. Veidenbaum, F. Kurdahi, N. Dutt

{"title":"Multiple sleep modes leakage control in peripheral circuits of a all major SRAM-based processor units","authors":"H. Homayoun, Avesta Sasan, Aseem Gupta, A. Veidenbaum, F. Kurdahi, N. Dutt","doi":"10.1145/1787275.1787339","DOIUrl":"https://doi.org/10.1145/1787275.1787339","url":null,"abstract":"Leakage currents in on-chip SRAMs: caches, branch predictor, register files and TLBs, are major contributors to the energy dissipated by processors in deep sub-micron technologies. High leakage also increases chip temperature and some SRAM-based structures become thermal hotspots. Previous work has addressed major sources of SRAM leakage in memory cells and bit-lines, making remaining SRAM components, in particular large drivers, the primary source of leakage. This paper proposes an approach to reduce this source of leakage in all major SRAM-based units of the processor, controlling them in a uniform way, yet treating each unit individually based on its behavior and memory organization. The new approach uses multiple bias voltages in sleep transistors allowing a trade-off between leakage reduction and wakeup delay in multi-stage peripheral drivers. Four low-power modes are defined, from basic to ultra low power, and SRAMs dynamically transition between these modes to minimize leakage without sacrificing performance. A novel control mechanism monitors and predicts future processor behavior for mode control. The leakage reduction in individual units is evaluated and shown to vary from 25% for IL1 to 75% for L2 caches. Resulting temperature reduction, including the effect of positive feedback between temperature and leakage power, is evaluated. A significant temperature reduction is achieved in each unit. It is also shown to reduce hot spots in the instruction TLB and branch predictor.","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124690523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Operating system support for mitigating software scalability bottlenecks on asymmetric multicore processors 操作系统支持减轻非对称多核处理器上的软件可伸缩性瓶颈

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787281

J. C. Saez, Alexandra Fedorova, M. Prieto, Hugo Vegas

{"title":"Operating system support for mitigating software scalability bottlenecks on asymmetric multicore processors","authors":"J. C. Saez, Alexandra Fedorova, M. Prieto, Hugo Vegas","doi":"10.1145/1787275.1787281","DOIUrl":"https://doi.org/10.1145/1787275.1787281","url":null,"abstract":"Asymmetric multicore processors (AMP) promise higher performance per watt than their symmetric counterparts, and it is likely that future processors will integrate a few fast out-of-order cores, coupled with a large number of simpler, slow cores, all exposing the same instruction-set architecture (ISA). It is well known that one of the most effective ways to leverage the effectiveness of these systems is to use fast cores to accelerate sequential phases of parallel applications, and to use slow cores for running parallel phases. At the same time, we are not aware of any implementation of this parallelism-aware (PA) scheduling policy in an operating system. So the questions as to whether this policy can be delivered efficiently by the operating system to unmodified applications, and what the associated overheads are remain open. To answer these questions we created two different implementations of the PA policy in OpenSolaris and evaluated it on real hardware, where asymmetry was emulated via CPU frequency scaling. This paper reports our findings with regard to benefits and drawbacks of this scheduling policy.","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125054159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Self organization on a swarm computing fabric: a new way to look at fault tolerance 群计算结构上的自组织:一种看待容错的新方法

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2010-05-17 DOI: 10.1145/1787275.1787343

D. Pani, Simone Secchi, L. Raffo

{"title":"Self organization on a swarm computing fabric: a new way to look at fault tolerance","authors":"D. Pani, Simone Secchi, L. Raffo","doi":"10.1145/1787275.1787343","DOIUrl":"https://doi.org/10.1145/1787275.1787343","url":null,"abstract":"Recent studies have demonstrated the possibility to exploit Swarm Intelligence (SI) as an inspiration for the design of scalable VLSI tiled architectures exhibiting multitasking, adaptability, absence of centralized low-level control and fault-tolerance. SI approach to fault-tolerance, in principle, can be regarded as a reconfiguration-free cell-exclusion mechanism. The key elements at the basis of a reconfiguration free solution are: loose structure of the system, homogeneity, cooperative behaviors and self organization. In this paper, these self organization aspects, introduced in a recently developed multi-agent VLSI tiled architecture for array processing, expressly developed resorting to the SI inspiration, are presented along with some theoretical and experimental results. The architecture presents two forms of cell-exclusion (bypass and block of faulty elements), implementing self-adaptive behaviors rather than reconfiguration to face faults preserving system functionality. The proposed approach, exploiting indirect communications to provide workload spreading into the computing fabric, is also successful in reducing the effects of the presence of faulty elements without spare resources and with limited performance degradation.","PeriodicalId":151791,"journal":{"name":"Proceedings of the 7th ACM international conference on Computing frontiers","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115185160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Supporting lock-free composition of concurrent data objects 支持并发数据对象的无锁组合

Proceedings of the 7th ACM international conference on Computing frontiers Pub Date : 2009-10-02 DOI: 10.1145/1787275.1787286

Daniel Cederman, P. Tsigas

引用次数: 6