Shahar Kvatinsky, A. Kolodny, U. Weiser, E. Friedman
{"title":"Memristor-based IMPLY logic design procedure","authors":"Shahar Kvatinsky, A. Kolodny, U. Weiser, E. Friedman","doi":"10.1109/ICCD.2011.6081389","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081389","url":null,"abstract":"Memristors can be used as logic gates. No design methodology exists, however, for memristor-based combinatorial logic. In this paper, the design and behavior of a memristive-based logic gate - an IMPLY gate - are presented and design issues such as the tradeoff between speed (fast write times) and correct logic behavior are described, as part of an overall design methodology. A memristor model is described for determining the write time and state drift. It is shown that the widely used memristor model - a linear ion drift memristor - is impractical for characterizing an IMPLY logic gate, and a different memristor model is necessary such as a memristor with a current threshold.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116211651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RoShaQ: High-performance on-chip router with shared queues","authors":"A. Tran, B. Baas","doi":"10.1109/ICCD.2011.6081402","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081402","url":null,"abstract":"On-chip router typically has buffers dedicated to its input or output ports for temporarily storing packets in case contention occurs on output physical channels. Buffers, unfortunately, consume significant portions of router area and power. While running a traffic trace, however, not all input ports of routers have incoming packets needed to be transferred at the same time. As a result, a large number of buffer queues in the network are empty while other queues are mostly busy. This observation motivates us to design RoShaQ, a router architecture that maximizes buffer utilization by allowing to share multiple buffer queues among input ports. Sharing queues, in fact, makes using buffers more efficient hence is able to achieve higher throughput when the network load becomes heavy. On the other side, at light traffic load, our router achieves low latency by allowing packets to effectively bypass these shared queues. Experimental results show that RoShaQ is 21% less latency and 14% higher saturation throughput than a typical virtual-channel (VC) router with 4% higher power and 16% larger area. Due to its higher performance, RoShaQ consumes 7% less energy per a transferred packet than a VC router given the same buffer space capacity.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127076483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A simple pipelined squaring circuit for DSP","authors":"V. Risojevic, A. Avramović, Z. Babic, P. Bulić","doi":"10.1109/ICCD.2011.6081392","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081392","url":null,"abstract":"There are many digital signal processing applications where a shorter time delay of algorithms and efficient implementations are more important than accuracy. Since squaring is one of the fundamental operations widely used in digital signal processing algorithms, approximate squaring is proposed. We present a simple way of approximate squaring that allows achieving a desired accuracy. The proposed method uses the same simple combinational logic for the first approximation and correction terms. Performed analysis for various bit-length operands and level of approximation showed that maximum relative errors and average relative errors decrease significantly by adding more correction terms. The proposed squaring method can be implemented with a great level of parallelism. The pipelined implementation is also proposed in this paper. The proposed squarer achieved significant savings in area and power when compared to multiplier based squarer. As an example, an analysis of the impact of Euclidean distance calculation by approximate squaring on image retrieval is performed.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"134 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124257327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hardware Trojans: The defense and attack of integrated circuits","authors":"Trey Reece, W. H. Robinson","doi":"10.1109/ICCD.2011.6081412","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081412","url":null,"abstract":"Despite the increasing threat of hardware Trojans in modern circuits, there is currently a lack of Trojans available with which to test anti-Trojan techniques. This paper presents both a defensive technique designed to aid in the detection of Trojans in integrated circuits, as well as a design ideology to confound similar defensive techniques. The defense structure was awarded 2nd place in the 2009 Embedded Systems Challenge (ESC) competition, and Trojans following the attack ideology were submitted to the same competition the following year.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130563393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kanit Therdsteerasukdi, Gyungsu Byun, J. Ir, Glenn D. Reinman, J. Cong, Mau-Chung Frank Chang
{"title":"The DIMM tree architecture: A high bandwidth and scalable memory system","authors":"Kanit Therdsteerasukdi, Gyungsu Byun, J. Ir, Glenn D. Reinman, J. Cong, Mau-Chung Frank Chang","doi":"10.1109/ICCD.2011.6081428","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081428","url":null,"abstract":"The demand for capacity and off-chip bandwidth to DRAM will continue to grow as we integrate more cores onto a die. However, as the data rate of DRAM has increased, the number of DIMMs supported on a multi-drop bus has decreased. Therefore, traditional memory systems are not sufficient to meet both these demands. We propose the DIMM tree architecture for better scalability by connecting the DIMMs as a tree. The DIMM tree architecture is able to grow the number of DIMMs exponentially with each level of latency in the tree. We also propose application of Multiband Radio Frequency Interconnect (MRF-I) to the DIMM tree architecture for even greater scalability and higher throughput. The DIMM tree architecture without MRF-I was able to scale up to 64 DIMMs with only an 8% degradation in throughput over an ideal system. The DIMM tree architecture with MRF-I was able to increase throughput by 68% (up to 200%) on a 64-DIMM system over a 4-DIMM system.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134498064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reyhaneh Jabbarvand Behrouz, M. Modarressi, H. Sarbazi-Azad
{"title":"A reconfigurable fault-tolerant routing algorithm to optimize the network-on-chip performance and latency in presence of intermittent and permanent faults","authors":"Reyhaneh Jabbarvand Behrouz, M. Modarressi, H. Sarbazi-Azad","doi":"10.1109/ICCD.2011.6081436","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081436","url":null,"abstract":"As the semiconductor industry advances to the deep sub-micron and nano technology points, the on-chip components are more prone to the defects during manufacturing and faults during system operation. Consequently, fault tolerant techniques are essential to improve the yield of modern complex chips. We propose a fault-tolerant routing algorithm that keeps the negative effect of faulty components on the NoC power and performance as low as possible. Targeting intermittent faults, we achieve fault tolerance by employing a simple and fast mechanism composed of two processes: NoC monitoring and route adaption. Experimental results show the effectiveness of the proposed technique, in that it offers lower average message latency and power consumption and a higher reliability, compared to some related work.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133422038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Red team: Design of intelligent hardware trojans with known defense schemes","authors":"Xuehui Zhang, Nicholas Tuzzio, M. Tehranipoor","doi":"10.1109/ICCD.2011.6081416","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081416","url":null,"abstract":"In the past few years, several Trojan detection approaches have been developed to prevent the damages caused by Trojans, making Trojan insertion more and more difficult. As part of the Embedded Systems Challenge (ESC), we were given two different designs with two different Trojan detection methods, and we tried to design Trojans which could avoid detection. We developed Trojans that remain undetectable by the delay fingerprinting and ring-oscillator monitoring Trojan detection methods embedded into these benchmarks. Experimental results on a Xilinx FPGA demonstrate that most of our hardware Trojans were undetected using the inserted detection mechanisms.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121734819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring the vulnerability of CMPs to soft errors with 3D stacked non-volatile memory","authors":"Guangyu Sun, E. Kursun, J. Rivers, Yuan Xie","doi":"10.1145/2491679","DOIUrl":"https://doi.org/10.1145/2491679","url":null,"abstract":"Spin-transfer Torque Random Access Memory (STT-RAM) emerges for on-chip memory in microprocessor architectures. Thanks to the magnetic field based storage STT-RAM cells have immunity to radiation induced soft errors that affect electrical charge based data storage, which is a major challenge in SRAM based caches in current microprocessors. In this study we explore the soft error resilience benefits and design trade offs of 3D-stacked STT-RAM for multi-core architectures. We use 3D stacking as an enabler for modular integration of STT-RAM caches with minimum disruption in the baseline processor design flow, while providing further interconnectivity and capacity advantages. We take an in-depth look at alternative replacement schemes in terms of performance, power, temperature, and reliability trade-offs to capture the multi-variable optimization challenges microprocessor architectures face. We analyze and compare the characteristics of STT-RAM, SRAM, and DRAM alternatives for various levels of the cache hierarchy in terms of reliability.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"78 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126008873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Techniques for LI-BDN synthesis for hybrid microarchitectural simulation","authors":"Tyler S. Harris, Zhuo Ruan, D. Penry","doi":"10.1109/ICCD.2011.6081405","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081405","url":null,"abstract":"Computer designers rely upon near-cycle-accurate microarchitectural simulation to explore the design space of new systems. Unfortunately, such simulators are becoming increasingly slow as systems become more complex. Hybrid simulators which offload some of the simulation work onto FPGAs can increase the speed; however, such simulators must be automatically synthesized or the time to design them becomes prohibitive. Furthermore, FPGA implementations of simulators may require multiple FPGA clock cycles to implement behavior that takes place within one simulated clock cycle, making correct arbitrary composition of simulator components impossible and limiting the amount of hardware concurrency which can be achieved. Latency-Insensitive Bounded Dataflow Networks (LI-BDNs) have been suggested as a means to permit composition of simulator components in FPGAs. However, previous work has required that LI-BDNs be created manually. This paper introduces techniques for automated synthesis of LI-BDNs from the processes of a System-C microarchitectural model. We demonstrate that LI-BDNs can be successfully synthesized. We also introduce a technique for reducing the overhead of LI-BDNs when the latency-insensitive property is unnecessary, resulting in up to a 60% reduction in FPGA resource requirements.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"282 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120963477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu Pang, Shaoquan Wang, Zhilong He, Jinzhao Lin, S. Sultana, K. Radecka
{"title":"Positive Davio-based synthesis algorithm for reversible logic","authors":"Yu Pang, Shaoquan Wang, Zhilong He, Jinzhao Lin, S. Sultana, K. Radecka","doi":"10.1109/ICCD.2011.6081399","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081399","url":null,"abstract":"Reversible logic is a key technique for quantum computing so leading to low-power designs. However, current synthesis algorithms for reversible circuits are low efficiency and do not obtain optimized reversible circuits, so they are only applied to small logic functions. In this paper, we propose a new method based on positive Davio expansion to synthesize reversible circuits, which generates a positive Davio decision diagram for a logic function and transfers diagram nodes to reversible circuits. The algorithm has advantages of optimizing area and fast synthesis speed compared to BDD (Binary decision diagram) based and RM (Reed-Muller) based synthesis method, so it can be adapted for large functions.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115301726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}