{"title":"FPGA implementation and analysis of random delay insertion countermeasure against DPA","authors":"Yingxi Lu, Máire O’Neill, J. McCanny","doi":"10.1109/FPT.2008.4762384","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762384","url":null,"abstract":"Security devices can reveal critical information about the cryptographic key from the power consumption of their circuits. Differential power analysis (DPA) is one of the most effective power analysis techniques. In recent years numerous countermeasures against the DPA attack of hardware implementations of security algorithms have been proposed. In this paper, we investigate the random delay insertion (RDI) countermeasure. Previous research has evaluated RDI for microprocessor implementations; however, its security properties in relation to hardware implementations have not been investigated in detail. We prove both theoretically and practically that it is an effective technique on FPGA devices and we propose a set of critical parameters that can be utilized to optimize a security algorithm design with RDI in terms of area, speed and power. In this work, we implement the first hardware security architecture with RDI on an FPGA device, and attack it using DPA. It is shown that RDI is an efficient countermeasure technique on FPGA in comparison to other countermeasures.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133721572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kazuo Miura, Hiroki Noguchi, H. Kawaguchi, M. Yoshimoto
{"title":"A low memory bandwidth Gaussian mixture model (GMM) processor for 20,000-word real-time speech recognition FPGA system","authors":"Kazuo Miura, Hiroki Noguchi, H. Kawaguchi, M. Yoshimoto","doi":"10.1109/FPT.2008.4762413","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762413","url":null,"abstract":"We propose a GMM processor for large vocabulary real-time continuous speech recognition. This processor achieves low operating frequency and low memory bandwidth using parallelization and vector look-ahead schemes, which are suitable to FPGA implementation. We designed the proposed processor on a Celoxica RC250 FPGA board, and confirmed that the required frequency and memory bandwidth for real-time operation are reduced by 89.8% and 84.2%, respectively. The 20,000-word real-time GMM computation is made at a frequency of 30.4 MHz and memory bandwidth of 47 Mbps, on the prototype.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"6 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114103217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Kawai, Y. Yamaguchi, M. Yasunaga, K. Glette, J. Tørresen
{"title":"An adaptive pattern recognition hardware with on-chip shift register-based partial reconfiguration","authors":"H. Kawai, Y. Yamaguchi, M. Yasunaga, K. Glette, J. Tørresen","doi":"10.1109/FPT.2008.4762380","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762380","url":null,"abstract":"A pattern recognition system that can process a large amount of image data at high speed is required in many fields. In this paper, we propose an on-chip pattern recognition system that utilizes the reconfigurability of the FPGA. The features of the system are not only very high recognition speed but also an adaptive function. For example, when objects to be detected change appearance, recognition parameters must be changed to retain the recognition accuracy. The system can automatically adjust by executing on-chip partial reconfiguration. The system runs at 25 MHz and can return a recognition result in one clock cycle, 40 ns. To update the system, all processes needed for searching for the best recognition parameters, generating configuration data and reconfiguring the system are carried out within 30s.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127205005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hirokazu Morishita, Yasunori Osana, N. Fujita, H. Amano
{"title":"Exploiting memory hierarchy for a Computational Fluid Dynamics accelerator on FPGAs","authors":"Hirokazu Morishita, Yasunori Osana, N. Fujita, H. Amano","doi":"10.1109/FPT.2008.4762383","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762383","url":null,"abstract":"Computational fluid dynamics (CFD) is an important tool for aeronautical engineers. Instead of expensive super-computers or clusters, using custom pipelines built on FPGAs is expected to be a cost effective solution to accelerate CFD. The problem is that to keep the pipeline busy is difficult because of the memory bandwidth. To deal with this problem, an effective memory access method using block-RAMs is implemented based on a careful survey about memory access pattern. This work is targetting on two major subroutines in UPACS, a CFD software package. As a result, the amount of data transfer is reduced about 40%. This shows 46-170 fold speed-up is expected by several Virtex-4 FPGAs compared to Itanium2 processor.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126895557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An analog reconfiguration-period adjustment technique for optically reconfigurable gate arrays","authors":"T. Mabuchi, Minoru Watanabe","doi":"10.1109/FPT.2008.4762400","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762400","url":null,"abstract":"In previously proposed ORGAs, the optical reconfiguration period was designed to be constant by assuming a worst-case reconfiguration speed. However, the diffraction efficiency of a holographic memory differs depending on the number of bright bits included in a configuration context. Therefore, previous ORGAs can not fully exploit reconfiguration performance. For that reason, this paper presents a proposal for an analog reconfiguration-period adjustment technique for ORGAs to reduce each reconfiguration period. The advantages are then discussed herein based on experimental results obtained using an ORGA system on which the technique is adopted.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124876492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerating hardware simulation: Testbench code emulation","authors":"I. Mavroidis, I. Papaefstathiou","doi":"10.1109/FPT.2008.4762375","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762375","url":null,"abstract":"Todaypsilas verification challenges require high-performance simulation solutions, such as hardware simulation accelerators and emulators, that have been in use in hardware and electronic system design centers for approximately the last decade. In particular, in order to accelerate functional simulation, hardware emulation is used so as to offload calculation-intensive tasks from the software simulator. However, the communication overhead between the software simulator and the hardware emulator is becoming a new critical bottleneck. In our work we introduce a novel way of repartitioning the simulation between software and hardware in order to minimize this communication bottleneck. Using the techniques described in this paper we are able to offload a big part of the work that is traditionally done by the software simulator, onto the hardware emulator. Our experiments, using real-world designs, demonstrate that the proposed method reduces significantly the communication overhead and outperforms the conventional hardware emulation systems by a factor of more than 7. Finally, we provide a way of observing and modifying the internal state of the hardware emulator while the test is running.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131531179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ρ-VEX: A reconfigurable and extensible softcore VLIW processor","authors":"Stephan Wong, T. V. As, Geoffrey M. Brown","doi":"10.1109/FPT.2008.4762420","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762420","url":null,"abstract":"This paper presents the architectural design of a reconfigurable and extensible very long instruction word (VLIW) processor. In addition to architectural extensibility, our processor also supports reconfigurable operations. Furthermore, we present an application development framework to optimally exploit the freedom of reconfigurable operations. Because our processor is based on the VEX ISA, we already have a good compiler which is able to deal with ISA extensibility and reconfigurable operations. Our results show that different configurations of our processor lead to considerable cycle count reductions for a selected benchmark application.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130598930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High level quantitative interconnect estimation for Early Design Space Exploration","authors":"R. Meeuws, K. Sigdel, Y. Yankova, K. Bertels","doi":"10.1109/FPT.2008.4762407","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762407","url":null,"abstract":"In this paper, we present an approach for prediction of interconnect resources at the early stages of design. This approach was developed as an extension to the Quipu multi-dimensional quantitative prediction model for early design space exploration. Quipu is a part of the Delft Workbench project, a semi-automatic tool platform supporting integrated hardware-software co-design for heterogeneous computing systems. Because of the highly iterative nature of design in such tool platforms, fast and early estimates of hardware properties are required. One aspect of particular importance is the utilization of interconnect resources, which has increased with designs becoming larger, even to the point where some designs are no longer routable. We establish a method of estimating interconnect from a C-level description using partial least squares regression (PLSR) and software complexity metrics (SCM) for use in the Delft Workbench tool platform. We show that our approach can make predictions with an expected error of 31.6%.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130719670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chi Wai Yu, Alastair M. Smith, W. Luk, P. Leong, S. Wilton
{"title":"Optimizing coarse-grained units in floating point hybrid FPGA","authors":"Chi Wai Yu, Alastair M. Smith, W. Luk, P. Leong, S. Wilton","doi":"10.1109/FPT.2008.4762366","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762366","url":null,"abstract":"This paper introduces a novel methodology to optimize coarse-grained floating point units (FPUs) in a hybrid FPGA. We employ common subgraph extraction to determine the number of floating point adders/subtracters (FAs), multipliers (FMs) and wordblocks (WBs) in the FPUs. We flrst study the area, speed and utilization trade-off of the selected FPU subgraphs in a set of floating point benchmark circuits. We then explore the impact of density and flexibility of FPUs on the system in terms of area, speed and routing resources. We derive an optimized coarse-grained FPU by considering both architectural and system level issues. The results show that: (1) embedding more types of coarse-grained FPU in the system causes at most 21.3% increase in delay, (2) the area of the system can be reduced by 27.4% by embedding high density subgraphs, (3) the high density subgraphs requires 14.8% fewer routing resources.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134638488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Kernel sharing on reconfigurable multiprocessor systems","authors":"Philip C. Garcia, Katherine Compton","doi":"10.1109/FPT.2008.4762387","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762387","url":null,"abstract":"Because of the difficulty of increasing single-threaded processor performance, multi-core systems are becoming increasingly popular. These systems bring new challenges to the design of a reconfigurable computing system, with reconfigurable hardware potentially shared between multiple simultaneously-executing applications. In this paper, we examine how to best use reconfigurable hardware in a multiprocessor system. One of the key aspects of this work is improving overall system throughput by sharing configured circuits between multiple processes concurrently executing on the system. In this work, we show that using our extensions for sharing configured circuits between processes improves overall system throughput, and outperforms a static schedule of the kernels between the multiple processes.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132298486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}