Guang Sun, Chia-Wei Chang, Bill Lin, Lieguang Zeng
{"title":"Oblivious routing design for mesh networks to achieve a new worst-case throughput bound","authors":"Guang Sun, Chia-Wei Chang, Bill Lin, Lieguang Zeng","doi":"10.1109/ICCD.2012.6378674","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378674","url":null,"abstract":"1/2 network capacity is often believed to be the limit of worst-case throughput for mesh networks. However, this paper provides a new worst-case throughput bound, which is higher than 1/2 network capacity, for odd radix two-dimensional mesh networks. In addition, we propose a routing algorithm called U2TURN that can achieve this worst-case throughput bound for odd radix meshes. For even radix meshes, we prove that U2TURN achieves the optimal worst-case throughput, namely, half of network capacity. U2TURN considers all routing paths with at most 2 turns and distributes the traffic loads uniformly in both X and Y dimensions. Theoretical analysis and simulation results show that U2TURN outperforms existing routing algorithms in worst-case throughput. Moreover, U2TURN achieves good average-throughput at the expense of approximately 1.5× minimal average hop count. For asymmetric meshes, we further propose an algorithm called “U2TURN-A” and provide theoretical analysis for different algorithms. Both theoretical analysis and simulation show that U2TURN and U2TURN-A outperform existing algorithms VAL, DOR and O1TURN in both worst-case and average throughput for asymmetric meshes.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131500847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling economics of LSI design and manufacturing for test design selection","authors":"H. Ichihara, N. Shimizu, T. Iwagaki, Tomoo Inoue","doi":"10.1109/ICCD.2012.6378701","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378701","url":null,"abstract":"Many test designs (or DFTs: designs-for-testability) have been proposed to overcome various issues around LSI testing. In this paper, we propose a cost and benefit model for comparing several test designs in terms of the final profit of logic LSI design and manufacturing. Test designs can affect chip area, testing time, test generation time and fault coverage; in the proposed model, we clarify the relationship among these factors for major three test designs: scan design, built-in self-test (BIST) design and test compression design. The proposed model reveals the final profit for each test design in a given LSI design and manufacturing environment, so that it can designate a suitable test design in the early stage of LSI design flow. We show an example of application of the proposed model for test design selection in a given environment.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124931566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DuSCA: A multi-channeling strategy for doubling communication capacity in wireless NoC","authors":"Yi Wang, Dan Zhao, Jian Li","doi":"10.1109/ICCD.2012.6378620","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378620","url":null,"abstract":"To bridge the widening gap between computation requirements and communication efficiency faced by many-core chips, Wireless Network-on-Chip (WiNoC) has been proposed by using ultra-wideband interconnect. While prior research has demonstrated the salient features of WiNoC as high perlink data rate, high accumulated bandwidth, high flexibility, low overhead and low power consumption, this research aims to develop a multi-access WiNoC to substantially improve the end-to-end performance of on-chip communication. Enabled by time hopping PPM multi-channel capability, we propose an efficient multi-channel distribution and arbitration scheme for improving communication concurrency and resolving channel competition among multiple users to achieve the desired network performance. Our simulation studies based on synthetic traffics demonstrate the efficiency, cost effectiveness and scalability of the channel arbitration scheme and the promising network performance of WiNoC.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115245735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comparative study of wearout mechanisms in state-of-art microprocessors","authors":"Chang-Chih Chen, Fahad Ahmed, L. Milor","doi":"10.1109/ICCD.2012.6378651","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378651","url":null,"abstract":"In this work, we perform a comparative study of different wearout mechanisms affecting the state-of-art microprocessor systems. Taking into account the detailed thermal and electrical stress profiles, we present a methodology to accurately estimate the lifetime due to each mechanism. The lifetime-limiting wearout mechanisms are highlighted using standard benchmarks along with the reliability-critical microprocessor functional units.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114318783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptable intrusion detection using partial runtime reconfiguration","authors":"M. Rahmatian, H. Kooti, I. Harris, E. Bozorgzadeh","doi":"10.1109/ICCD.2012.6378633","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378633","url":null,"abstract":"Intrusion detection approaches have been presented which detect anomalous malware behavior at runtime. Most techniques involve software-based analysis which is too slow to support the tight timing constraints often imposed on embedded systems. We propose a hardware-based intrusion detection approach which does not alter the functional performance of the system. When using a real-time operating system, the executing process changes several times each second, requiring fast adaptation on the part of the intrusion detection mechanism. We present a technique to exploit the partial runtime reconfiguration feature present on many modern field programmable gate arrays (FPGAs) to adapt intrusion detection to a new process at each context switch. The use of runtime reconfiguration enables the flexibility of software-based approaches with the performance benefits of hardware-based approaches.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131137093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ECC string: Flexible ECC management for low-cost error protection of L2 caches","authors":"Jeongkyu Hong, Soontae Kim","doi":"10.1109/ICCD.2012.6378699","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378699","url":null,"abstract":"Conventional error correcting codes (ECC) scheme for caches is based on fixed mapping between cache words and ECC check bits, and fixed ECC word granularity, which leads to inefficient usage of ECC check bits. In contrast, we propose to use the ECC check bits flexibly for low-cost error protections of L2 caches. Our ECC scheme works at word level while the conventional ECC scheme works at cache line or set level; Our scheme protects only dirty words. In addition, our scheme utilizes variable ECC word granularities; Dirty words that are unlikely to be modified further are protected together with larger ECC word granularity. Our scheme reduces DRAM and data bus energy overheads by 28% and 45% on average, respectively, with the same area overhead as the previously proposed competitive scheme.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123675770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A retrospective look at xpipes: The exciting ride from a design experience to a design platform for nanoscale networks-on-chip","authors":"D. Bertozzi, L. Benini","doi":"10.1109/ICCD.2012.6378614","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378614","url":null,"abstract":"This paper gives a retrospective look at the xpipes framework, and documents its evolution from a promising network-on-chip (NoC) design experience to a comprehensive design platform for the next-generation of nanoscale NoCs. Since the early days of xpipes, its cross-layer approach to NoC design has given a significant contribution to bridge the gap between the NoC concept and an industry-relevant interconnect technology.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"500 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128208816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exposing vulnerabilities of untrusted computing platforms","authors":"Yier Jin, M. Maniatakos, Y. Makris","doi":"10.1109/ICCD.2012.6378629","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378629","url":null,"abstract":"This work seeks to expose the vulnerability of un-trusted computing platforms used in critical systems to hardware Trojans and combined hardware/software attacks. As part of our entry in the Cyber Security Awareness Week (CSAW) Embedded System Challenge hosted by NYU-Poly in 2011, we developed and presented 10 such processor-level hardware Trojans. These are split in five categories with various impacts, such as altering instruction memory, modifying the communication channel, stealing user information, changing interrupt handler location and RC-5 encryption algorithm checking of a medium complexity micro-processor (8051). Our work serves as a good starting point for researchers to develop Trojan detection and prevention methodologies on modern processor and to ensure trustworthiness of computing platforms.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127208683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Designing pipelined delay lines with dynamically-adaptive granularity for low-energy applications","authors":"Christos Vezyrtzis, Y. Tsividis, S. Nowick","doi":"10.1109/ICCD.2012.6378660","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378660","url":null,"abstract":"A calibrated delay line is a key component in many modern digital systems. Traditionally, these lines are designed as real-time pipelines with static granularity, fine enough to handle a worst-case input rate. However, due to their rigid structure, they have sub-optimal energy for low- and varying-rate input streams. We introduce a complete methodology for designing reconfigurable delay lines which dynamically adapt granularity to traffic, on-the-fly, without stalling or disturbing normal operation. These lines have two modes: coarse- and fine-grain. During sparser traffic, the system is reconfigured to coarse-grain mode, thereby reducing total energy, and it reverts to fine-grain mode during denser traffic. In each case, overall delay is preserved. This strategy is especially beneficial for applications where input traffic is highly varied. The particular focus of this paper is on one promising domain, continuous-time digital signal processors (CT DSP's), a new class of processors targeting low-energy applications. The proposed system includes two lightweight asynchronous control blocks: a digital controller to continuously monitor input traffic, and a micropipeline to dynamically reconfigure the entire delay line. With a complete implementation in a 0.13 um IBM CMOS technology, post-layout simulations demonstrate an average overall dynamic power reduction up to 45.5% compared to a non-adaptive design, with only minimal area overhead. The design methodology is modular, supporting extensions to multiple configuration modes to provide even greater power reduction for a variety of input traffic. While results are presented for CT DSP's, significant benefits are also expected in many other domains where delay lines are used.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130051721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast development of hardware-based run-time monitors through architecture framework and high-level synthesis","authors":"Mohamed Ismail, G. Suh","doi":"10.1109/ICCD.2012.6378669","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378669","url":null,"abstract":"Recent work has shown that hardware-based runtime monitoring techniques can significantly enhance security and reliability of computing systems with minimal performance and energy overheads. However, the cost and time for implementing such a hardware-based mechanism presents a major challenge in deploying the run-time monitoring techniques in real systems. This paper addresses this design complexity problem through a common architecture framework and high-level synthesis. Similar to customizable processors such as Tensilica Xtensa where designers only need to write a small piece of code that describes a custom instruction, our framework enables designers to only specify monitoring operations. The framework provides common functions such as collecting a trace of execution, maintaining meta-data, and interfacing with software. To further reduce the design complexity, we also explore using a high-level synthesis tool (Cadence C-to-Silicon) so that hardware monitors can be described in a high-level language (SystemC) instead of in RTL such as Verilog and VHDL. To evaluate our approach, we implemented a set of monitors including soft-error checking, uninitialized memory checking, dynamic information flow tracking, and array boundary checking in our framework. Our results suggest that our monitor framework can greatly reduce the amount of code that needs to be specified for each extension and the high-level synthesis can achieve comparable area, performance, and power consumption to handwritten RTL.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130998944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}