A. Agarwal, S. Mathew, S. Hsu, M. Anders, Himanshu Kaul, F. Sheikh, R. Ramanarayanan, S. Srinivasan, R. Krishnamurthy, S. Borkar
{"title":"A 320mV-to-1.2V on-die fine-grained reconfigurable fabric for DSP/media accelerators in 32nm CMOS","authors":"A. Agarwal, S. Mathew, S. Hsu, M. Anders, Himanshu Kaul, F. Sheikh, R. Ramanarayanan, S. Srinivasan, R. Krishnamurthy, S. Borkar","doi":"10.1109/ISSCC.2010.5433903","DOIUrl":null,"url":null,"abstract":"Computationally intensive DSP/media processing applications require specialized hardware accelerators to enable higher energy-efficiency on microprocessor platforms. On-die reconfigurable arrays enable flexible accelerators with dynamic on-the-fly programmability while amortizing die area and time-to-market costs across a wide range of workloads. An ultra-low-voltage fine-grained reconfigurable fabric consisting of a hybrid configurable logic block (CLB) array with process/voltage/temperature (PVT) variation-tolerant register file (Fig. 18.2.1), targeted for on-die acceleration of DSP/media algorithms on power-constrained mobile microprocessors, is fabricated in 32nm high-k/metal-gate CMOS [1]. The CLB combines self-decoded look-up tables (LUTs) for random logic with reconfigurable arithmetic building blocks, hybrid 3∶2 compressors with integrated partial product generation, configurable adder/multiplier carry propagation and optimized CLB input/output multiplexers to achieve peak energy-efficiency of 2.6TOPS/W measured at 340mV, 50°C. The register file includes programmable stacked shared keepers and interruptible operation of both write memory cells and set-dominant latches (SDLs), improving Vcc-min by 300mV across PVT variations with a wide dynamic operating range of 320mV–1.2V, enabling simultaneous dynamic supply/frequency optimization across target workloads and power budgets. These features also achieve: (i) nominal CLB performance of 2.4GHz, 5.3mW measured at 1.0V, (ii) robust CLB functionality measured at 260mV, 27MHz (sub-threshold) consuming 12µW, (iii) scalable register file performance up to 8.2GHz, 125mW measured at 1.2V, 50°C with low-voltage near-threshold operation at 320mV, 252MHz consuming 430µW, (iv) 4-tap FIR filter, radix-2 FFT butterfly and 16b string-match algorithms with peak throughput of 2.1GSamples/s, 2.4GSamples/s and 100Gbps respectively, and (v) application-dependent dual-supply power savings up to 34%.","PeriodicalId":6418,"journal":{"name":"2010 IEEE International Solid-State Circuits Conference - (ISSCC)","volume":"47 1","pages":"328-329"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Solid-State Circuits Conference - (ISSCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSCC.2010.5433903","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 40
Abstract
Computationally intensive DSP/media processing applications require specialized hardware accelerators to enable higher energy-efficiency on microprocessor platforms. On-die reconfigurable arrays enable flexible accelerators with dynamic on-the-fly programmability while amortizing die area and time-to-market costs across a wide range of workloads. An ultra-low-voltage fine-grained reconfigurable fabric consisting of a hybrid configurable logic block (CLB) array with process/voltage/temperature (PVT) variation-tolerant register file (Fig. 18.2.1), targeted for on-die acceleration of DSP/media algorithms on power-constrained mobile microprocessors, is fabricated in 32nm high-k/metal-gate CMOS [1]. The CLB combines self-decoded look-up tables (LUTs) for random logic with reconfigurable arithmetic building blocks, hybrid 3∶2 compressors with integrated partial product generation, configurable adder/multiplier carry propagation and optimized CLB input/output multiplexers to achieve peak energy-efficiency of 2.6TOPS/W measured at 340mV, 50°C. The register file includes programmable stacked shared keepers and interruptible operation of both write memory cells and set-dominant latches (SDLs), improving Vcc-min by 300mV across PVT variations with a wide dynamic operating range of 320mV–1.2V, enabling simultaneous dynamic supply/frequency optimization across target workloads and power budgets. These features also achieve: (i) nominal CLB performance of 2.4GHz, 5.3mW measured at 1.0V, (ii) robust CLB functionality measured at 260mV, 27MHz (sub-threshold) consuming 12µW, (iii) scalable register file performance up to 8.2GHz, 125mW measured at 1.2V, 50°C with low-voltage near-threshold operation at 320mV, 252MHz consuming 430µW, (iv) 4-tap FIR filter, radix-2 FFT butterfly and 16b string-match algorithms with peak throughput of 2.1GSamples/s, 2.4GSamples/s and 100Gbps respectively, and (v) application-dependent dual-supply power savings up to 34%.