2010 IEEE International Conference on Computer Design最新文献_第5页

Implicit hints: Embedding hint bits in programs without ISA changes 隐式提示:在不改变ISA的情况下，在程序中嵌入提示位

2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647699

H. Vandierendonck, K. D. Bosschere

引用次数: 4

A control-theoretic energy management for fault-tolerant hard real-time systems 容错硬实时系统的控制理论能量管理

2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647798

Ali Sharif Ahmadian, Mahdieh Hosseingholi, A. Ejlali

{"title":"A control-theoretic energy management for fault-tolerant hard real-time systems","authors":"Ali Sharif Ahmadian, Mahdieh Hosseingholi, A. Ejlali","doi":"10.1109/ICCD.2010.5647798","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647798","url":null,"abstract":"Recently, the tradeoff between low energy consumption and high fault-tolerance has attracted a lot of attention as a key issue in the design of real-time embedded systems. Dynamic Voltage Scaling (DVS) is known as one of the most effective low energy techniques for real-time systems. It has been observed that the use of control-theoretic methods can improve the effectiveness of DVS-enabled systems. In this paper, we have investigated reducing the energy consumption of fault-tolerant hard real-time systems using feedback control theory. Our proposed feedback-based DVS method makes the system capable of selecting the proper frequency and voltage settings in order to reduce the energy consumption while guaranteeing hard real-time requirements in the presence of unpredictable workload fluctuations and faults. In the proposed method, the available slack-time is exploited by a feedback-based DVS at runtime to reduce the energy consumption. Furthermore, some slack-time is reserved for re-execution in case of faults. Simulation results show that, as compared with traditional DVS methods without fault-tolerance, our proposed approach not only significantly reduces energy consumption, but also it satisfies hard real-time constraints in the presence of faults. The transition overhead (both time and energy), caused by changing the system supply voltage, are also taken into account in our simulation experiments.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123957828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Recent additions to the ARMv7-A architecture 最近新增的ARMv7-A架构

2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647549

D. Brash

引用次数: 7

Package-Aware Scheduling of embedded workloads for temperature and Energy management on heterogeneous MPSoCs 异构mpsoc温度和能量管理中嵌入式工作负载的包感知调度

2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647628

Shervin Sharifi, T. Simunic

{"title":"Package-Aware Scheduling of embedded workloads for temperature and Energy management on heterogeneous MPSoCs","authors":"Shervin Sharifi, T. Simunic","doi":"10.1109/ICCD.2010.5647628","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647628","url":null,"abstract":"In this paper, we present PASTEMP, a solution for Package Aware Scheduling for Thermal and Energy management using Multi- Parametric programming in heterogeneous embedded multiprocessor SoCs (MPSoCs). Based on the current thermal state of the system and current performance requirements of the workload, PASTEMP finds thermally safe and energy efficient voltage/frequency configurations for the cores on a MPSoC. The tasks are assigned to the cores depending on their performance demand and the current voltage/frequency of the core. The voltage/frequency settings of the cores are chosen through an optimization process which is based on the instantaneous thermal model we introduce to decouple the effect of package temperature from the temperature changes caused by the power consumption of the cores. To be able to find the best voltage/frequency settings at runtime, we use multi-parametric programming to separate the optimization into offline and online phases. According to our experimental results, compared to similar DTM techniques, PASTEMP results in up to 23% energy saving and 26% throughput improvement and reduces the deadline misses to more than a half while meeting all thermal constraints.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124282047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

An energy model for graphics processing units 图形处理单元的能量模型

2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647678

Jeff Pool, A. Lastra, Montek Singh

引用次数: 27

Spintronic logic gates for spintronic data using magnetic tunnel junctions 使用磁隧道结的自旋电子数据的自旋电子逻辑门

2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647611

S. Patil, A. Lyle, J. Harms, D. Lilja, Jianping Wang

{"title":"Spintronic logic gates for spintronic data using magnetic tunnel junctions","authors":"S. Patil, A. Lyle, J. Harms, D. Lilja, Jianping Wang","doi":"10.1109/ICCD.2010.5647611","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647611","url":null,"abstract":"The emerging field of spintronics is undergoing exciting developments with the advances recently seen in spintronic devices, such as magnetic tunnel junctions (MTJs). While they make excellent memory devices, recently they have also been used to accomplish logic functions. The properties of MTJs are greatly different from those of electronic devices like CMOS semiconductors. This makes it challenging to design circuits that can efficiently leverage the spintronic capabilities. The current approaches to achieving logic functionality with MTJs include designing an integrated CMOS and MTJ circuit, where CMOS devices are used for implementing the required intermediate read and write circuitry. The problem with this approach is that such intermediate circuitry adds overheads of area, delay and power consumption to the logic circuit. In this paper, we present a circuit to accomplish logic operations using MTJs on data that is stored in other MTJs, without an intermediate electronic circuitry. This thus reduces the performance overheads of the spintronic circuit while also simplifying fabrication. With this circuit, we discuss the notion of performing logic operations with a non-volatile memory device and compare it with the traditional method of computation with separate logic and memory units. We find that the MTJ-based logic unit has the potential to offer a higher energy-delay efficiency than that of a CMOS-based logic operation on data stored in a separate memory module.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122891529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

A study on performance benefits of core morphing in an asymmetric multicore processor 非对称多核处理器中核变形的性能优势研究

2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647566

Anup Das, Rance Rodrigues, I. Koren, S. Kundu

{"title":"A study on performance benefits of core morphing in an asymmetric multicore processor","authors":"Anup Das, Rance Rodrigues, I. Koren, S. Kundu","doi":"10.1109/ICCD.2010.5647566","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647566","url":null,"abstract":"Multicore architectures are designed so as to provide an acceptable level of performance per unit power for the majority of applications. Consequently, we must occasionally expect applications that could have benefited from a more powerful core in terms of either lower execution time and/or lower energy consumed. Fusing some of the resources of two (or more) cores to configure a more powerful core for such instances is a natural approach to deal with those few applications that have very high performance demands. However, a recent study has shown that fusing homogeneous cores is unlikely to benefit applications. In this paper we study the potential performance benefits of core morphing in a heterogeneous multicore processor that can be reconfigured at runtime. We consider as an example a dual core processor with one of the two cores being designed to target integer intensive applications while the other is better suited to floating-point intensive applications. These two cores can be fused into a single powerful core when an application that can benefit from such fusion is executing. We first discuss the design principles of the two individual cores so that the majority of the benchmarks that we consider execute in a satisfactory way. We then show that a small subset of the considered applications can greatly benefit from core morphing even in the case where two applications that could have been executed in parallel on the two cores are run, for some percentage of time, on the single morphed core. Our results indicate that a performance gain of up to 100% is achievable at a small hardware overhead of less than 1%.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126775871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Insertion policy selection using Decision Tree Analysis 使用决策树分析的插入策略选择

2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647608

S. Khan, Daniel A. Jiménez

{"title":"Insertion policy selection using Decision Tree Analysis","authors":"S. Khan, Daniel A. Jiménez","doi":"10.1109/ICCD.2010.5647608","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647608","url":null,"abstract":"The last-level cache (LLC) mitigates the impact of long memory access latencies in today's microarchitectures. The insertion policy in the LLC has a significant impact on cache efficiency. A fixed insertion policy can allow useless blocks to remain in the cache longer than necessary, resulting in inefficiency. We introduce insertion policy selection using Decision Tree Analysis (DTA). The technique requires minimal hardware modification over the least-recently-used (LRU) replacement policy. This policy uses the fact that the LLC filters temporal locality. Many of the lines brought to the cache are never accessed again. Even if they are reaccessed they do not experience bursts, but rather they are reused when they are near to the LRU position in the LRU stack. We use decision tree analysis of multi-set-dueling to choose the optimal insertion position in the LRU stack. Inserting in this position, zero reuse lines minimize their dead time while the non-zero reuse lines remain in the cache long enough to be reused and avoid a miss. For a 1MB 16 way set-associative last level cache in a single core processor, our policy uses only 2,069 additional bits over the LRU replacement policy. On average it reduces misses by 5.16% and achieves 7.19% IPC improvement over LRU.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126552643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

A lightweight run-time scheduler for multitasking multicore stream applications 用于多任务多核流应用程序的轻量级运行时调度器

2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647732

Michael A. Baker, Karam S. Chatha

{"title":"A lightweight run-time scheduler for multitasking multicore stream applications","authors":"Michael A. Baker, Karam S. Chatha","doi":"10.1109/ICCD.2010.5647732","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647732","url":null,"abstract":"Stream programming models promise dramatic improvements in developers' ability to express parallelism in their applications while enabling extremely efficient implementations on modern many-core processors. Unfortunately, the wide variation in the architectural features of available multi-core processors implies that a single compiler may be incapable of generating general solutions which can run on many target systems, or even on different configurations of the same system. In particular, off-line approaches for finding optimal mappings and schedules for a stream program on a specific processor are limited by their lack of portability across different processors, and by a lack of flexibility for run time variations in resource availability in typical multi-tasking environments. The paper presents a scheme that includes a lightweight compile-time sequencer, and a dynamic scheduler capable of mapping stream programs onto available cores in a multi-core processor at run-time. Unlike previous implementations, our scheme requires limited knowledge of the target architecture's resources at compile time. The off-line portion of the scheme generates canonical scheduling information about the stream program. This information is utilized by the lightweight run-time scheduling algorithm to generate application mappings in linear time based on available resources giving near optimal throughput. Evaluations of schedules generated for twelve streaming benchmarks gives an average of 96% and 93% of the theoretical optimum throughput for schedules with up to 4 and 128 cores, respectively.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126590098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A simple pipelined logarithmic multiplier 一个简单的流水线对数乘法器

2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647767

P. Bulić, Z. Babic, A. Avramović

{"title":"A simple pipelined logarithmic multiplier","authors":"P. Bulić, Z. Babic, A. Avramović","doi":"10.1109/ICCD.2010.5647767","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647767","url":null,"abstract":"Digital signal processing algorithms often rely heavily on a large number of multiplications, which is both time and power consuming. However, there are many practical solutions to simplify multiplication, like truncated and logarithmic multipliers. These methods consume less time and power but introduce errors. Nevertheless, they can be used in situations where a shorter time delay is more important than accuracy. In digital signal processing, these conditions are often met, especially in video compression and tracking, where integer arithmetic gives satisfactory results. This paper presents and compare different multipliers in a logarithmic number system. For the hardware implementation assessment, the multipliers are implemented on the Spartan 3 FPGA chip and are compared against speed, resources required for implementation, power consumption and error rate. We also propose a simple and efficient logarithmic multiplier with the possibility to achieve an arbitrary accuracy through an iterative procedure. In such a way, the error correction can be done almost in parallel (actually this is achieved through pipelining) with the basic multiplication. The hardware solution involves adders and shifters, so it is not gate and power consuming. The error of proposed multiplier for operands ranging from 8 bits to 16 bits indicates a very low relative error percentage.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124239359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17