2015 33rd IEEE International Conference on Computer Design (ICCD)最新文献_第10页

Power and performance characterization, analysis and tuning for energy-efficient edge detection on atom and ARM based platforms 基于原子和ARM平台的节能边缘检测的功率和性能表征，分析和调优

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357153

P. Otto, Maria Malik, N. Akhlaghi, Rebel Sequeira, H. Homayoun, S. Sikdar

{"title":"Power and performance characterization, analysis and tuning for energy-efficient edge detection on atom and ARM based platforms","authors":"P. Otto, Maria Malik, N. Akhlaghi, Rebel Sequeira, H. Homayoun, S. Sikdar","doi":"10.1109/ICCD.2015.7357153","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357153","url":null,"abstract":"The de facto standard for embedded platforms with medium to low computing demands are ARM with Thumb ISA and Intel Atom with the X86 ISA with multiple cores. Operating these architectures in the milliwatts range while running realtime computer vision corner detection algorithms is a challenging problem. We present the analysis of power, performance and energy-efficiency measurements of Harris corner detection across a wide range of voltage and frequency settings, multicore/multithreading strategies, and compiler and application optimization parameters to find how the interplay of these parameters affect the power, performance and energy-efficiency. Our measurement of results on state-of-the-art embedded platforms demonstrate that a systematic cross-layer optimization at the application level (Sobel filter type, aperture size, number of image tiles), compiler level (branch prediction, function inlining) and system level (voltage and frequency setting, single core vs multicore implementation) significantly improves the energy-efficiency of corner detection, while meeting its real-time performance constraints. This cross-layer optimization improves the energy-efficiency of Harris corner on Atom and ARM by 89.5% and 87.2%, respectively.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121644916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Performance optimization for on-chip sensors to detect recycled ICs 片上传感器检测回收ic的性能优化

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357116

Bicky Shakya, Ujjwal Guin, M. Tehranipoor, Domenic Forte

引用次数: 10

InvArch: A hardware eficient architecture for Matrix Inversion InvArch:用于矩阵反演的硬件高效架构

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357100

Umer I. Cheema, G. Nash, R. Ansari, A. Khokhar

{"title":"InvArch: A hardware eficient architecture for Matrix Inversion","authors":"Umer I. Cheema, G. Nash, R. Ansari, A. Khokhar","doi":"10.1109/ICCD.2015.7357100","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357100","url":null,"abstract":"This paper proposes an efficient architecture (InvArch) for computing matrix inversion using Gauss-Jordan Elimination method. The proposed architecture exploits parallelism through pipelined floating-point computational units and reduces the number of floating-point multiplication units required compared with the existing pipelined implementations. The reduction in multiplication units results in over 80% reduction in hardware for floating point computation units. The architecture performs in-place inversion and provides scalability across the rows and columns. Hardware efficiency is achieved by reaping benefit from regularity in computation and better utilization of pipelined computational resources. Multiple rows are normalized within an iteration of Gauss-Jordan algorithm that allows reduction in number of floating-point multiplication units in the elimination step. In addition to implementing the architecture, an analytical performance model is also developed for InvArch and some related works. InvArch achieves performance comparable to reference architectures in terms of clock cycles and throughput while using significantly less hardware resources.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"489 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115881786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Exploring early and late ALUs for single-issue in-order pipelines 探索单问题有序管道的早期和晚期alu

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357163

Alen Bardizbanyan, P. Larsson-Edefors

{"title":"Exploring early and late ALUs for single-issue in-order pipelines","authors":"Alen Bardizbanyan, P. Larsson-Edefors","doi":"10.1109/ICCD.2015.7357163","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357163","url":null,"abstract":"In-order processors are key components in energy-efficient embedded systems. One important design aspect of inorder pipelines is the sequence of pipeline stages: First, the position of the execute stage, in which arithmetic logic unit (ALU) operations and branch prediction are handled, impacts the number of stall cycles that are caused by data dependencies between data memory instructions and their consuming instructions and by address generation instructions that depend on an ALU result. Second, the position of the ALU inside the pipeline impacts the branch penalty. This paper considers the question on how to best make use of ALU resources inside a single-issue in-order pipeline. We begin by analyzing which is the most efficient way of placing a single ALU in an in-order pipeline. We then go on to evaluate what is the most efficient way to make use of two ALUs, one early and one late ALU, which is a technique that has revitalized commercial in-order processors in recent years. Our architectural simulations, which are based on 20 MiBench and 7 SPEC2000 integer benchmarks and a 65-nm postlayout netlist of a complete pipeline, show that utilizing two ALUs in different stages of the pipeline gives better performance and energy efficiency than any other pipeline configuration with a single ALU.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116612730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

VLSI implementation of high-throughput, low-energy, configurable MIMO detector VLSI实现高吞吐量、低能耗、可配置MIMO探测器

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357162

P. Chuang, M. Sachdev, V. Gaudet

引用次数: 2

Energy-optimal voltage model supporting a wide range of nodal switching rates for early design-space exploration 能量最优电压模型，支持早期设计空间探索的大范围节点切换率

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357129

Doyun Kim, Jiangyi Li, Mingoo Seok

引用次数: 2

Exploring the viability of stochastic computing 探索随机计算的可行性

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357131

Joao Marcos de Aguiar, S. Khatri

{"title":"Exploring the viability of stochastic computing","authors":"Joao Marcos de Aguiar, S. Khatri","doi":"10.1109/ICCD.2015.7357131","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357131","url":null,"abstract":"Recently, stochastic circuits have received significant attention from academia. Stochastic circuits claim to have a reduced energy consumption at the cost of accuracy and delay. In this paper, we explore the power, delay, energy and area of a stochastic circuit (a stochastic multiplier in particular), and compare these metrics with those of a regular multiplier, implemented using the Sum Of Products (SOP) approach. The SOP based multiplier is implemented both using a Kogge-Stone Adder, as well as a Ripple-Carry adder. Our results show that when the stochastic number generator (SNG) and counter are included in the stochastic multiplier (SM), even for 3 bits, the SM consumes more energy to finish one multiplication than an SOP based regular binary multiplier (RM), and this energy consumption grows exponentially as the number of bits increases. If we only consider the stochastic multiplier cell (SMC, which is simply a 2-input AND gate) and ignore the energy of the SNG and counter, the SMC has a better energy consumption for multiplications up to 12 bits. However, even for 3 bits, the SM (or the SMC) is slower by over 5x compared to the regular multiplier, and this delay increases exponentially as the number of bits increases. The area of the SM (including the area of the SNG and counter) is smaller for multipliers with more than 6 bits.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127056828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Power management of pulsed-index communication protocols 脉冲索引通信协议的电源管理

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357127

Shahzad Muzaffar, I. Elfadel

引用次数: 8

Improving reliability, performance, and energy efficiency of STT-MRAM with dynamic write latency 提高动态写延迟的STT-MRAM的可靠性、性能和能效

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357091

A. Ahari, Mojtaba Ebrahimi, Fabian Oboril, M. Tahoori

{"title":"Improving reliability, performance, and energy efficiency of STT-MRAM with dynamic write latency","authors":"A. Ahari, Mojtaba Ebrahimi, Fabian Oboril, M. Tahoori","doi":"10.1109/ICCD.2015.7357091","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357091","url":null,"abstract":"High write latency and high write energy are the major challenges in Spin Transfer Torque Magnetic Random Access Memory (STT-MRAM) design. The write operation in STT-MRAM is of stochastic nature. Therefore, it requires a very long timing margin to maintain an acceptable level of reliability and yield. Traditionally, Error Correction Codes (ECCs) are used to reduce the timing margin in STT-MRAM. However, they impose high storage and latency overheads. In this paper, we propose a low-cost architecture-level technique to significantly reduce the amount of required timing margin. This technique employs a handshaking protocol between the memory and its controller to dynamically determine the write latency at run-time. Our simulation infrastructure comprehensively models the combined effect of process variation and stochastic write behavior at circuit-level and abstracts it to architecture-level. The simulation results show that the proposed technique not only considerably reduces the write error rate but also improves the overall system performance on average by 15.4% compared to existing solutions.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124877792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

A methodology for power characterization of associative memories 联想记忆的功率表征方法

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357156

Dawei Li, S. Joshi, S. Memik, J. Hoff, S. Jindariani, Tiehui Liu, J. Olsen, N. Tran

{"title":"A methodology for power characterization of associative memories","authors":"Dawei Li, S. Joshi, S. Memik, J. Hoff, S. Jindariani, Tiehui Liu, J. Olsen, N. Tran","doi":"10.1109/ICCD.2015.7357156","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357156","url":null,"abstract":"Content Addressable Memories (CAM) have become increasingly more important in applications requiring high speed memory search due to their inherent massively parallel processing architecture. We present a complete power analysis methodology for CAM systems to aid the exploration of their power-performance trade-offs in future systems. Our proposed methodology uses detailed transistor level circuit simulation of power behavior and a handful of input data types to simulate full chip power consumption. Furthermore, we applied our power analysis methodology on a custom designed associative memory test chip. This chip was developed by Fermilab for the purpose of developing high performance real-time pattern recognition on high volume data produced by a future large-scale scientific experiment. We applied our methodology to configure a power model for this test chip. Our model is capable of predicting the total average power within 4% of actual power measurements. Our power analysis methodology can be generalized and applied to other CAM-like memory systems and accurately characterize their power behavior.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"80 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114032075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4