Zachary Cashero, Allen Chen, Ryan Hoppal, Tom Chen
{"title":"Fast Evaluation of Analog Circuits Using Linear Programming","authors":"Zachary Cashero, Allen Chen, Ryan Hoppal, Tom Chen","doi":"10.1109/ISVLSI.2010.94","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.94","url":null,"abstract":"Although analog circuits are usually only a small part of an overall mixed-signal design, the time it takes to design these circuits is significantly longer. On the digital side, design automation and optimization tools have been around for a long time and are well established. These tools free up the designer to focus on more innovative tasks and allow for greater design productivity. The analog side, however, has yet to reach this point. There is much ongoing research in this area and many different approaches to try to mitigate the complexity of designing analog circuits. We propose an algorithm that utilizes the fast and efficient execution of linear programming to provide a solution for single objective optimization of small circuits. We show the results of this algorithm on three sample circuits with the objectives of maximizing gain and bandwidth. The results from our proposed optimization algorithm were within 10 percent of the maximum gain/bandwidth derived using traditional manual optimization methods.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115736845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Artificial Neural Network-Based Hotspot Prediction Mechanism for NoCs","authors":"E. Kakoulli, V. Soteriou, T. Theocharides","doi":"10.1109/ISVLSI.2010.50","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.50","url":null,"abstract":"Hotspots are network on-chip (NoC) routers or modules in systems on-chip (SoCs) which occasionally receive packetized traffic at a rate higher than they can consume it. This adverse phenomenon greatly reduces the performance of an NoC, especially in the case of today’s widely-employed wormhole flow-control, as backpressure can cause the buffers of neighboring routers to quickly fill-up leading to a spatial spread in congestion that can cause the network to saturate. Even worse, such situations may lead to deadlocks. Thus, a hotspot prevention mechanism can be greatly beneficial, as it can potentially enable the interconnection system to adjust its behavior and prevent the rise of potential hotspots, subsequently sustaining NoC performance and efficiency. Unfortunately, hotspots cannot be known a-priori in NoCs used in general-purpose systems as application demands are not predetermined unlike in application-specific SoCs, making hotspot prediction and subsequently prevention difficult. In this paper we present an artificial neural network-based hotspot prediction mechanism that can be potentially used in tandem with a hotspot avoidance mechanism for handling an unforeseen hotspot formation efficiently. The network uses buffer utilization statistical data to dynamically monitor the interconnect fabric, and reactively predicts the location of an about to-be-formed hotspot, allowing enough time for the system to react to these potential hotspots. The neural network is trained using synthetic traffic models, and evaluated using both synthetic and real application traces. Results indicate that a relatively small neural network can predict hotspot formation with accuracy ranges between 76% to 92% when evaluated on two different mesh NoCs.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131281407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Daneshtalab, M. Ebrahimi, P. Liljeberg, J. Plosila, H. Tenhunen
{"title":"Input-Output Selection Based Router for Networks-on-Chip","authors":"M. Daneshtalab, M. Ebrahimi, P. Liljeberg, J. Plosila, H. Tenhunen","doi":"10.1109/ISVLSI.2010.76","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.76","url":null,"abstract":"In this paper, we propose a novel on-chip router architecture for avoiding congested areas in regular two-dimensional on-chip networks. This architecture takes advantage of an efficient adaptive routing model based on the Hamiltonian path for both the multicast and unicast traffic. The output selection of the proposed architecture is based on the congestion condition of neighboring routers and the input selection is based on the Weighted Round Robin mechanism which allows packets to be serviced from each input port according to its congestion level. The simulation results show that in multicast, unicast, and mixed traffic profiles the proposed model has lower average delays and lower average and peak power compared to previously proposed models.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"285 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126854686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Boey, Yingxi Lu, Máire O’Neill, Roger Francis Woods
{"title":"Differential Power Analysis of CAST-128","authors":"K. Boey, Yingxi Lu, Máire O’Neill, Roger Francis Woods","doi":"10.1109/ISVLSI.2010.14","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.14","url":null,"abstract":"Power analysis is used to reveal the secret key of security devices by monitoring the power consumption of certain cryptographic algorithm operations through a statistical analysis approach known as Differential Power Analysis (DPA). Whilst this has been applied extensively to attacks on FPGA devices, there has been little research into attacks on ASIC devices. Although standard DPAs are essentially independent of the block cipher that they target, some are less susceptible than others due to algorithm’s structure, and therefore more difficult to attack such as the CAST-128. In this paper, we outline the first reported power analysis attack of CAST-128 as it falls into the category just outlined and it is the only algorithm that has not been practically broken either on FPGA or ASIC, it is also a common block cipher used in Canada. The paper outlines an approach that reveals all 128 bits of the secret key within 300,500 power traces, highlighting insights on attacking the registers rather than the Sbox. Finally, the effect of applying the Hamming weight power model on different widths of the target register under attack in ASIC device is evaluated.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115495376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DC Offset Modeling and Noise Minimization for Differential Amplifier in Subthreshold Operation","authors":"Kapil K. Rajput, Anil K. Saini, S. C. Bose","doi":"10.1109/ISVLSI.2010.46","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.46","url":null,"abstract":"This work presents the rigorous formulation of input referred offset voltage for differential amplifier, having the input pair devices in subthreshold region of operation. The formulation has been verified in 0.35 μm and 0.18 μm CMOS technologies by using Monte Carlo Simulation. Minimization of 1/f noise is the additional advantage of this method.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125335608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Fattah, Abdurrahman Manian, A. Rahimi, S. Mohammadi
{"title":"A High Throughput Low Power FIFO Used for GALS NoC Buffers","authors":"Mohammad Fattah, Abdurrahman Manian, A. Rahimi, S. Mohammadi","doi":"10.1109/ISVLSI.2010.44","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.44","url":null,"abstract":"In Networks-on-chip, increasing the depth of routers’ buffers even by a few stages can have a significant effect on average latency and saturation threshold of the network. However, the price to pay could be high in terms of power and silicon area. In this paper, we propose a low power, high throughput asynchronous FIFO suitable for buffers of GALS NoC routers. We consistently compare the performance with regards to power, area and throughput of our FIFO with some different FIFO structures, by exploring their design trade-offs with various number of stages and for different data lengths. These structures are simulated in 90nm CMOS technology with accurate spice simulations, where results show a low power consumption and latency, with a higher throughput. Finally, a back-annotated HDL model of a 4x4 mesh network, wherein a fully asynchronous router is implemented, shows better average latency, saturation threshold and power tradeoffs, using the proposed FIFO.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"180 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120866903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Candaele, Sylvain Aguirre, Michel Sarlotte, Iraklis Anagnostopoulos, S. Xydis, A. Bartzas, D. Bekiaris, Dimitrios Soudris, Zhonghai Lu, Xiaowen Chen, Jean-Michel Chabloz, A. Hemani, A. Jantsch, G. Vanmeerbeeck, J. Kreku, Kari Tiensyrjä, F. Ieromnimon, D. Kritharidis, Andreas Wiefrink, B. Vanthournout, Philippe Martin
{"title":"Mapping Optimisation for Scalable Multi-core ARchiTecture: The MOSART Approach","authors":"B. Candaele, Sylvain Aguirre, Michel Sarlotte, Iraklis Anagnostopoulos, S. Xydis, A. Bartzas, D. Bekiaris, Dimitrios Soudris, Zhonghai Lu, Xiaowen Chen, Jean-Michel Chabloz, A. Hemani, A. Jantsch, G. Vanmeerbeeck, J. Kreku, Kari Tiensyrjä, F. Ieromnimon, D. Kritharidis, Andreas Wiefrink, B. Vanthournout, Philippe Martin","doi":"10.1109/ISVLSI.2010.71","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.71","url":null,"abstract":"The project will address two main challenges of prevailing architectures: 1) The global interconnect and memory bottleneck due to a single, globally shared memory with high access times and power consumption, 2) The difficulties in programming heterogeneous, multi-core platforms, in particular in dynamically managing data structures in distributed memory. MOSART aims to overcome these through a multi-core architecture with distributed memory organisation, a Network-on-Chip (NoC) communication backbone and configurable processing cores that are scaled, optimised and customised together to achieve diverse energy, performance, cost and size requirements of different classes of applications. MOSART achieves this by: A) Providing platform support for management of abstract data structures including middleware services and a run-time data manager for NoC based communication infrastructure, 2) Developing tool support for parallelizing and mapping application son the multi-core target platform and customizing the processing cores for the application.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"7 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121250487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel On-Chip Interconnection Topology for Mesh-Connected Processor Arrays","authors":"Xiaofang Wang","doi":"10.1109/ISVLSI.2010.86","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.86","url":null,"abstract":"Prior studies on packet-switching on-chip networks have primarily focused on the micro architecture of the router to reduce the communication latency. In this paper, we propose a novel interconnection topology for mesh-connected processor arrays. By sharing routers among PEs and PEs among routers, our network significantly reduces the average hop count for a packet, thereby reducing the network latency and improving the throughput of the network. The interconnection network also requires less area compared to the conventional mesh organization, leaving more resources for the computing fabric. Extensive simulation results show that the proposed network reduces the network latency by up to 50.3 for a multiprocessor with 64 PEs.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132164765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BLAS Comparison on FPGA, CPU and GPU","authors":"S. Kestur, John D. Davis, Oliver Williams","doi":"10.1109/ISVLSI.2010.84","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.84","url":null,"abstract":"High Performance Computing (HPC) or scientific codes are being executed across a wide variety of computing platforms from embedded processors to massively parallel GPUs. We present a comparison of the Basic Linear Algebra Subroutines (BLAS) using double-precision floating point on an FPGA, CPU and GPU. On the CPU and GPU, we utilize standard libraries on state-of-the-art devices. On the FPGA, we have developed parameterized modular implementations for the dot-product and Gaxpy or matrix-vector multiplication. In order to obtain optimal performance for any aspect ratio of the matrices, we have designed a high-throughput accumulator to perform an efficient reduction of floating point values. To support scalability to large data-sets, we target the BEE3 FPGA platform. We use performance and energy efficiency as metrics to compare the different platforms. Results show that FPGAs offer comparable performance as well as 2.7 to 293 times better energy efficiency for the test cases that we implemented on all three platforms.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134238111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liang Lu, Weiqiang Liu, Máire O’Neill, E. Swartzlander
{"title":"QCA Systolic Matrix Multiplier","authors":"Liang Lu, Weiqiang Liu, Máire O’Neill, E. Swartzlander","doi":"10.1109/ISVLSI.2010.53","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.53","url":null,"abstract":"Quantum-dot Cellular Automata (QCA) technology is a promising alternative to CMOS technology. It is attractive due to its fast speed, small area and low power consumption. To explore the characteristics of QCA technology, digital circuit design approaches have been investigated. Due to the inherent wire delay in this technology, QCA appears to be suitable for pipelined architectures particularly. Systolic arrays take advantage of pipelining and parallelism. Therefore, an investigation into systolic array design in QCA technology is provided in this paper. A case study of the first systolic matrix multiplier is designed and analyzed. The results show that by applying the systolic array structure to QCA designs, significant benefits can be achieved particular with large systolic array size, even more so than when applied to CMOS-based technology. QCA has significant advantages in terms of speed and area over CMOS technology, for instance, a factor of 12 smaller in terms of the area in this proposed matrix multiplier design when compared with same CMOS 32nm implementation.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132135488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}