2016 IEEE 34th International Conference on Computer Design (ICCD)最新文献_第5页

Pull-off buffer: Borrowing cache space to avoid deadlock for fault-tolerant NoC routing Pull-off缓冲区:借用缓存空间以避免死锁，用于容错NoC路由

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753328

Airan Shao, Dongsheng Wang, Haixia Wang

{"title":"Pull-off buffer: Borrowing cache space to avoid deadlock for fault-tolerant NoC routing","authors":"Airan Shao, Dongsheng Wang, Haixia Wang","doi":"10.1109/ICCD.2016.7753328","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753328","url":null,"abstract":"Advances in semiconductor technology have led to large chip multiprocessor (CMP) employing network-on-chip (NoC) to provide scalable on-chip communication. This higher integration capacity, on the other hand, increases the possibility of faults. To tackle this challenge, fault-tolerant routing in NoC becomes essential, which allows packets to be routed around faulty network components and maintains normal communication. However, to tolerate a large number of faults, the deadlock problem becomes very difficult to deal with. Existing highly fault-tolerant routing solutions employ virtual channel (VC) or topology-agnostic routing for deadlock avoidance, but at the cost of lower network performance and the demand for extra hardware. In this paper, we show that it is possible to design a novel highly fault-tolerant routing method without VC and topology-agnostic routing. We present pull-off buffer (POB), a FIFO buffer borrowing the space already present in cache, to avoid potentially existing deadlocks. POBs borrow cache space only from selected nodes and only after the occurrence of faults. The space of caches at other nodes will not be affected. Experimental results show that our solution can provide 2x to 3x higher network throughput and reduce router area and power overhead, when compared against existing highly fault-tolerant routing methods employing VC or topology-agnostic routing.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126523856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

WILD: A workload-based learning model to predict dynamic delay of functional units 基于工作负荷的学习模型预测功能单元的动态延迟

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753279

Xun Jiao, Yu Jiang, Abbas Rahimi, Rajesh K. Gupta

{"title":"WILD: A workload-based learning model to predict dynamic delay of functional units","authors":"Xun Jiao, Yu Jiang, Abbas Rahimi, Rajesh K. Gupta","doi":"10.1109/ICCD.2016.7753279","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753279","url":null,"abstract":"Dynamic critical path analysis in modern processors is needed to reduce margins typically determined by the static timing analysis. Dynamic path analysis, however, is cost-prohibitive. In this paper, we propose WILD, a supervised learning model to predict dynamic delay of functional units (FUs) based on the input workload during execution. We measure the dynamic delay using switching activity generated through gate-level simulation of a post place-and-route design in TSMC 45nm process. We then look for `features' in the input data that influence dynamic path sensitization. Using these features we apply a logistic regression (LR) method to construct a predictive model trained and tested using three datasets: random, Sobel filter and Gaussian filter. We classify dynamic delay into five distinct classes. For a given test input, WILD predicts the class of output dynamic delay. On average across several FUs, 98.0% of WILD predictions are consistent with gate-level simulation. Using WILD-directed dynamic frequency scaling can improve instruction-level performance by 13%-44% compared to the state-of-the-art instruction-level timing model.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116837546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

A fast, fully verifiable, and hardware predictable ASIC design methodology 一种快速、完全可验证、硬件可预测的ASIC设计方法

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753304

P. Yang, M. Marek-Sadowska

{"title":"A fast, fully verifiable, and hardware predictable ASIC design methodology","authors":"P. Yang, M. Marek-Sadowska","doi":"10.1109/ICCD.2016.7753304","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753304","url":null,"abstract":"In this paper, a fast, fully verifiable, and hardware predictable ASIC design methodology is proposed and demonstrated for the Vertical Slit FET (VeSFET) based integrated circuits. The key enablers of this methodology are the unique and powerful capabilities of pillar-based two-side accessible transistor arrays and monolithic 3D integration. VeSFET is a successfully fabricated transistor of this kind. In the proposed methodology, the circuit is first designed on a 3D FPGA platform using a conventional FPGA design flow. With a little extra Back End of Line (BEOL) masking cost, the design implemented on the 3D FPGA is migrated to the final 2D ASIC, which has exactly the same performance and the verification tasks performed on the 3D FPGA platform remain valid for the final 2D ASIC. The 2D ASIC has the same layout as the silicon-proven 3D FPGA, which greatly mitigates the unpredictable factors of fabrication. The proposed methodology retains all the benefits of FPGA design flow. Eleven MCNC benchmark circuits were implemented. Comparing to the 2D FPGA, the performance of the final 2D ASIC implementation as well as the performance of the 3D FPGA design platform are on average 15% faster, consume 17% less power and 44% less area.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125248388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Voting system design pitfalls: Vulnerability analysis and exploitation of a model platform 投票系统设计陷阱:漏洞分析与模型平台开发

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753273

K. Ly, Orlando Arias, Jacob Wurm, Khoa Hoang, Kaveh Shamsi, Yier Jin

引用次数: 0

Thermal-aware 3D design for side-channel information leakage 侧通道信息泄漏的热感知三维设计

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753336

P. Gu, Dylan C. Stow, Russell Barnes, E. Kursun, Yuan Xie

引用次数: 18

Using Provenance to boost the Metadata Prefetching in distributed storage systems 在分布式存储系统中使用Provenance来提升元数据预取

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753264

G. Wu, Yuhui Deng, X. Qin

{"title":"Using Provenance to boost the Metadata Prefetching in distributed storage systems","authors":"G. Wu, Yuhui Deng, X. Qin","doi":"10.1109/ICCD.2016.7753264","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753264","url":null,"abstract":"Caching and prefetching are effective approaches to boosting the performance of metadata access in distributed storage systems. Many research efforts have been devoted in developing new metadata prefetching methods by considering past file access patterns. However, the existing methods do not consider the correlations between processes and the corresponding files(e.g. file provenance). Therefore, the methods cannot obtain very rich and accurate correlations, thus decreasing the effectiveness of metadata prefetching. This paper presents a Provenance-based Metadata Prefetching(ProMP) scheme, which considers both provenance and the past file access patterns. Through mining the correlations between processes and corresponding files from provenance and past access history, ProMP can achieve accurate and rich correlation information. ProMP is conducive to employing aggressive metadata prefetching to boost the performance by leveraging the correlations. Our experimental results show that ProMP performs more effectively with less memory overhead than the existing solutions, while improving the hit rates by up to 49% and 7% in contrast to traditional LRU and a state-of-art metadata prefetching algorithm Nexus, respectively.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122722746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

SPMario: Scale up MapReduce with I/O-Oriented Scheduling for the GPU SPMario:使用面向I/ o的GPU调度来扩展MapReduce

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753309

Yang Liu, Hung-Wei Tseng, S. Swanson

引用次数: 2

A 64 kb differential single-port 12T SRAM design with a bit-interleaving scheme for low-voltage operation in 32 nm SOI CMOS 64 kb差分单端口12T SRAM设计，采用位交错方案，用于32nm SOI CMOS的低压工作

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753333

Samira Ataei, J. Stine, Matthew R. Guthaus

{"title":"A 64 kb differential single-port 12T SRAM design with a bit-interleaving scheme for low-voltage operation in 32 nm SOI CMOS","authors":"Samira Ataei, J. Stine, Matthew R. Guthaus","doi":"10.1109/ICCD.2016.7753333","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753333","url":null,"abstract":"In this paper, a novel differential single-port 12T SRAM bitcell is presented. This bitcell uses a read buffer to eliminate read disturbance, improves the read stability and achieves read static noise margin equal to its hold static noise margin. Using a column-based select signal this bitcell provides a half-select free feature, facilitating a bit-interleaving structure to reduce multi-bit soft errors by conventional error correcting code techniques. By boosting the wordline and select signal voltage, this bitcell can read and write with no error at 300 mV while data can be held down to 250 mV in standby mode. Bitline leakage suppression in 12T bitcell allows more bitcells per bitline for high density SRAMs and provides faster read operation. This paper also introduces OpenRAM, an open-source memory compiler, that provides a platform for the generation, characterization, and verification of fabricable memory designs across various technologies, sizes, and configurations. Using OpenRAM, a 64 kb 12T SRAM macro is designed in IBM 32 nm SOI CMOS technology that operates down to 0.3 V with 50 MHz operating frequency while it functions at 0.9 V with 2.2 GHz operating frequency, as well.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129982090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

DOART: A low-power and low-latency Network-on-Chip DOART:低功耗、低延迟的片上网络

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753301

W. Zong, Qiang Xu

引用次数: 2

Hardware thread reordering to boost OpenCL throughput on FPGAs 硬件线程重新排序以提高fpga上的OpenCL吞吐量

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753288

Amir Momeni, H. Tabkhi, G. Schirner, D. Kaeli

{"title":"Hardware thread reordering to boost OpenCL throughput on FPGAs","authors":"Amir Momeni, H. Tabkhi, G. Schirner, D. Kaeli","doi":"10.1109/ICCD.2016.7753288","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753288","url":null,"abstract":"Availability of OpenCL for FPGAs has raised new questions about the efficiency of massive thread-level parallelism on FPGAs. The general trend is toward creating deep pipelining and in-order execution of many OpenCL threads across a shared data-path. While this can be a very effective approach for regular kernels, its efficiency significantly diminishes for irregular kernels with runtime-dependent control flow. We need to look for new approaches to improve execution efficiency of FPGAs when targeting irregular OpenCL kernels. This paper proposes a novel solution, called Hardware Thread Reordering (HTR), to boost the throughput of the FPGAs when executing irregular kernels possessing non-deterministic runtime control flow. The key insight of HRT is out-of-order OpenCL thread execution over a shared data-path to achieve significantly higher throughput. The thread reordering is performed at a basic-block level granularity. The synthesized basic-blocks are extended with independent pipeline control signals and context registers to bypass the live values of reordered threads. We demonstrate the efficiency of our proposed solution on three parallel irregular kernels. For the experiments, we utilize the LegUp tool to compare the baseline (in-order) data-path with HTR-enhanced data-path. Our RTL simulation results demonstrate that HTR-enhanced data-path achieves up to 11× increase in kernels throughput at a very low overhead (less than 2× increase in FPGA resources).","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122480236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4