2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)最新文献_第3页

A novel model for system-level decision making with combined ASP and SMT solving 结合ASP和SMT求解的系统级决策新模型

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.230

Alexander Biewer, J. Gladigau, C. Haubelt

引用次数: 4

Time-critical computing on a single-chip massively parallel processor 单片大规模并行处理器上的时间关键计算

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.110

B. Dinechin, D. V. Amstel, Marc Poulhiès, Guillaume Lager

{"title":"Time-critical computing on a single-chip massively parallel processor","authors":"B. Dinechin, D. V. Amstel, Marc Poulhiès, Guillaume Lager","doi":"10.7873/DATE.2014.110","DOIUrl":"https://doi.org/10.7873/DATE.2014.110","url":null,"abstract":"The requirement of high performance computing at low power can be met by the parallel execution of an application on a possibly large number of programmable cores. However, the lack of accurate timing properties may prevent parallel execution from being applicable to time-critical applications. We illustrate how this problem has been addressed by suitably designing the architecture, implementation, and programming model, of the Kalray MPPA®-256 single-chip many-core processor. The MPPA® -256 (Multi-Purpose Processing Array) processor integrates 256 processing engine (PE) cores and 32 resource management (RM) cores on a single 28nm CMOS chip. These VLIW cores are distributed across 16 compute clusters and 4 I/O subsystems, each with a locally shared memory. On-chip communication and synchronization are supported by an explicitly addressed dual network-on-chip (NoC), with one node per compute cluster and 4 nodes per I/O subsystem. Off-chip interfaces include DDR, PCI and Ethernet, and a direct access to the NoC for low-latency processing of data streams. The key architectural features that support time-critical applications are timing compositional cores, independent memory banks inside the compute clusters, and the data NoC whose guaranteed services are determined by network calculus. The programming model provides communicators that effectively support distributed computing primitives such as remote writes, barrier synchronizations, active messages, and communication by sampling. POSIX time functions expose synchronous clocks inside compute clusters and mesosynchronous clocks across the MPPA®-256 processor.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"14 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89111735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 174

Energy-efficient scheduling for memory-intensive GPGPU workloads 高效调度内存密集型GPGPU工作负载

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.032

Seokwoo Song, Minseok Lee, John Kim, Woong Seo, Yeon-Gon Cho, Soojung Ryu

{"title":"Energy-efficient scheduling for memory-intensive GPGPU workloads","authors":"Seokwoo Song, Minseok Lee, John Kim, Woong Seo, Yeon-Gon Cho, Soojung Ryu","doi":"10.7873/DATE.2014.032","DOIUrl":"https://doi.org/10.7873/DATE.2014.032","url":null,"abstract":"High performance for a GPGPU workload is obtained by maximizing parallelism and fully utilizing the available resources. However, this is not necessarily energy efficient, especially for memory-intensive GPGPU workloads. In this work, we propose Throttle CTA (cooperative-thread array) Scheduling (TCS) where we leverage two type of throttling - throttling the number of actives cores and throttling of warp execution in the cores - to improve energy-efficiency for memory-intensive GPGPU workloads. The algorithm requires the global CTA or thread block scheduler to reduce the number of cores with assigned thread blocks while leveraging the local warp scheduler to throttle memory requests for some of the cores to further reduce power consumption. The proposed TCS scheduling does not require off-line analysis but can be done dynamically during execution. Instead of relying on conventional metrics such as miss-per-kilo-instruction (MPKI), we leverage the memory access latency metric to determine the memory intensity of the workloads. Our evaluations show that TCS reduces energy by up to 48% (38% on average) across different memory-intensive workload while having very little impact on performance for compute-intensive workloads.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"61 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83120904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Improving STT-MRAM density through multibit error correction 通过多比特纠错提高STT-MRAM密度

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2014-03-24 DOI: 10.7873/DATE2014.195

Brandon Del Bel, Jongyeon Kim, C. Kim, S. Sapatnekar

引用次数: 54

Resolving the memory bottleneck for single supply near-threshold computing 解决单电源近阈值计算的内存瓶颈问题

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.215

T. Gemmeke, M. Sabry, J. Stuijt, P. Raghavan, F. Catthoor, David Atienza Alonso

引用次数: 15

Energy efficient data flow transformation for Givens Rotation based QR Decomposition 基于给定旋转QR分解的高能效数据流转换

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.224

Namita Sharma, P. Panda, Min Li, Prashant Agrawal, F. Catthoor

引用次数: 4

Spatial pattern prediction based management of faulty data caches 基于空间模式预测的故障数据缓存管理

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.073

G. Keramidas, Michail Mavropoulos, Anna Karvouniari, D. Nikolos

引用次数: 10

Temperature aware energy-reliability trade-offs for mapping of throughput-constrained applications on multimedia MPSoCs 多媒体mpsoc上吞吐量受限应用映射的温度感知能源可靠性权衡

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.115

Anup Das, Akash Kumar, B. Veeravalli

{"title":"Temperature aware energy-reliability trade-offs for mapping of throughput-constrained applications on multimedia MPSoCs","authors":"Anup Das, Akash Kumar, B. Veeravalli","doi":"10.7873/DATE.2014.115","DOIUrl":"https://doi.org/10.7873/DATE.2014.115","url":null,"abstract":"This paper proposes a design-time (offline) analysis technique to determine application task mapping and scheduling on a multiprocessor system and the voltage and frequency levels of all cores (offline DVFS) that minimize application computation and communication energy, simultaneously minimizing processor aging. The proposed technique incorporates (1) the effect of the voltage and frequency on the temperature of a core; (2) the effect of neighboring cores' voltage and frequency on the temperature (spatial effect); (3) pipelined execution and cyclic dependencies among tasks; and (4) the communication energy component which often constitutes a significant fraction of the total energy for multimedia applications. The temperature model proposed here can be easily integrated in the design space exploration for multiprocessor systems. Experiments conducted with MPEG-4 decoder on a real system demonstrate that the temperature using the proposed model is within 5% of the actual temperature clearly demonstrating its accuracy. Further, the overall optimization technique achieves 40% savings in energy consumption with 6% increase in system lifetime.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"23 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85183763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 44

PUF modeling attacks: An introduction and overview PUF建模攻击:介绍和概述

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.361

U. Rührmair, J. Sölter

引用次数: 91

Advanced system on a chip design based on controllable-polarity FETs 基于可控极性场效应管的先进片上系统设计

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.248

P. Gaillardon, L. Amarù, Jian Zhang, G. Micheli

{"title":"Advanced system on a chip design based on controllable-polarity FETs","authors":"P. Gaillardon, L. Amarù, Jian Zhang, G. Micheli","doi":"10.7873/DATE.2014.248","DOIUrl":"https://doi.org/10.7873/DATE.2014.248","url":null,"abstract":"Field-Effect Transistors (FETs) with on-line controllable-polarity are promising candidates to support next generation System-on-Chip (SoC). Thanks to their enhanced functionality, controllable-polarity FETs enable a superior design of critical components in a SoC, such as processing units and memories, while also providing native solutions to control power consumption. In this paper, we present the efficient design of a SoC core with controllable-polarity FET. Processing units are speeded-up at the datapath level, as arithmetic operations require fewer physical resources than in standard CMOS. Power consumption is decreased via embedded power-gating techniques and tunable high-performance/low-power devices operation. Memory cells are made smaller by merging the access interface with the storage circuitry. We foresee the advantages deriving from these techniques, by evaluating their impact on the design of SoC for a contemporary telecommunication application. Using a 22-nm vertically-stacked silicon nanowire technology, a coarse-grain evaluation at the block level estimates a delay and power reduction of 20% and 19% respectively, at a cost of a moderate area overhead of 15%, with respect to a state-of-art FinFET technology.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"104 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83627578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35