Bradley D. Christiansen, Yong C. Kim, R. Bennington, Christopher J. Ristich
{"title":"Decoy circuits for FPGA design protection","authors":"Bradley D. Christiansen, Yong C. Kim, R. Bennington, Christopher J. Ristich","doi":"10.1109/FPT.2006.270351","DOIUrl":"https://doi.org/10.1109/FPT.2006.270351","url":null,"abstract":"Field-programmable gate arrays (FPGAs) are increasingly used in system designs, but their vulnerability to reverse engineering could lead to lost profits or security breaches. Thus, high FPGA design security is needed with low performance penalties and low realization and maintenance costs. Using a novel circuit modification method, common circuits were augmented with decoy circuits for protection. Security values for the original and modified circuits were calculated, and the original and modified circuits' execution times, power consumptions, and resource usages were collected from simulations. For the modified circuits, security improved by six orders of magnitude, yet execution times, power consumption, and resource usage increased by less than one order of magnitude. The proposed algorithm has demonstrated the potential for substantial increases in FPGA design security at a low cost, and could also be applied to application-specific integrated circuits (ASICs)","PeriodicalId":354940,"journal":{"name":"2006 IEEE International Conference on Field Programmable Technology","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127732632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fuzzy modular multiplication architecture and low complexity IPR-protection for FPGA technology","authors":"A. Hanoun, W. Adi, F. Mayer-Lindenberg, B. Soudan","doi":"10.1109/FPT.2006.270339","DOIUrl":"https://doi.org/10.1109/FPT.2006.270339","url":null,"abstract":"The strong possibility of pirating, reengineering and over-deployment is a major impediment to the commercialization of IP-cores in the FPGA design environment. A mechanism for IP-protection based on public key bitstream encryption has previously been proposed. This paper describes a reasonable cost practical realization of the modular multiplication function required for the previously proposed system. A technique called fuzzy modular multiplication is employed to decrease the cost of modular squaring computations required for the public key exchange. An implementation using the Virtex-4 device from Xilinxreg is demonstrated to illustrate the low complexity cost. A refinement of the IP exchange scenario for the proposed IP-protection system is also included in this paper","PeriodicalId":354940,"journal":{"name":"2006 IEEE International Conference on Field Programmable Technology","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115991443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Vargas, L. Picolli, Antonio A. de Alecrim, M. Moraes, Marcio Gama
{"title":"Summarizing a time-sensitive control-flow checking monitoring for multitask systems-on-chip","authors":"F. Vargas, L. Picolli, Antonio A. de Alecrim, M. Moraes, Marcio Gama","doi":"10.1109/FPT.2006.270320","DOIUrl":"https://doi.org/10.1109/FPT.2006.270320","url":null,"abstract":"This paper summarizes a new approach based on a watchdog infrastructure intellectual property (I-IP) core to detect control-flow faults that affect CPU execution time. More precisely, this approach aims at detecting those faults that change the expected CPU instruction sequence and that as consequence, change also (by increasing or reducing) the expected CPU time allocated for the execution of the monitored task. The underlined advantage of this approach is the ability of detecting faults in multitask systems-on-chips (SoCs) running under the control of a real-time (preemptive) operating system. In this multi-task scenario, the I-IP can perform fault detection in a time-shared basis. Practical experiments have been carried out and results are discussed","PeriodicalId":354940,"journal":{"name":"2006 IEEE International Conference on Field Programmable Technology","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126155537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Power estimation of a LUT-based MPGA","authors":"Francisco-Javier Veredas, H. Pfleiderer","doi":"10.1109/FPT.2006.270336","DOIUrl":"https://doi.org/10.1109/FPT.2006.270336","url":null,"abstract":"Power consumption is a limiting factor to FPGA viablility in applications such as portable devices. LUT-based mask-programmable gate-arrays (LUT-based MPGAs) are alternatives to reach the fast turnaround times of an FPGA with low design cost and low power consumption. A LUT-based MPGA preserves the same logic-structure of a LUT-based FPGA. Unlike FPGAs, the programmable configuration and interconnect is mask-programmable. This paper describes a methodology to estimate power consumption in a LUT-based MPGA. The proposed methodology uses a gate-power estimation tool. The dynamic and static powers of the basic-gates are modeled in a library. The interconnect is easily modeled because the programmable metal-masks are predefined. A comparison with a transistor-level simulation shows an average difference of 20% with the final power result. The experiments show that the major contributor of the power consumption in the MPGA is the clock network. Power results on MPGAs and FPGAs are compared. The dynamic power consumption in the logic is reduced by 73%. The major power reduction is observed in the interconnects. Static power consumption in the LUT-based MPGA is insignificant compared its dynamic power consumption","PeriodicalId":354940,"journal":{"name":"2006 IEEE International Conference on Field Programmable Technology","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127422636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparing floating-point and logarithmic number representations for reconfigurable acceleration","authors":"H. Fu, O. Mencer, W. Luk","doi":"10.1109/FPT.2006.270342","DOIUrl":"https://doi.org/10.1109/FPT.2006.270342","url":null,"abstract":"The paper investigates floating-point and logarithmic number representations for computing with FPGAs. The key issue is to select the best number format for an application to improve performance and accuracy. Using A Stream Compiler, ASC as the hardware design and compilation tool, a convenient scheme to compare the designs of both floating-point and logarithmic numbers and select the solution with the best performance and accuracy, was developed. Its contributions are: (1) optimized function evaluations for conversions between logarithmic and floating-point numbers; (2) design and implementation of logarithmic arithmetic, with optimized segmentation and polynomial degree; (3) a practical comparison case study of Monte Carlo radiative heat transfer simulation. Compared to prior work, our design supports two to six times more LNS conversion and LNS arithmetic units on one FPGA. For Monte Carlo simulation, our designs of both number systems produce 39-80% higher throughput with either a smaller area or a higher accuracy","PeriodicalId":354940,"journal":{"name":"2006 IEEE International Conference on Field Programmable Technology","volume":"156 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120869803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance evaluations of ReconfigME","authors":"G. Wigley, D. Kearney","doi":"10.1109/FPT.2006.270335","DOIUrl":"https://doi.org/10.1109/FPT.2006.270335","url":null,"abstract":"With the development of reconfigurable computers containing FPGAs with in excess of 6 million system-gates, it is now feasible to consider the possibility of sharing the FPGA between multiple concurrently executing applications. This could potentially increase the resource usage of the expensive FPGA logic and decrease response times so users will not have to wait for the FPGA to be completely available. However the system environment software required to support this, may actually result in application performance much less than would be considered acceptable to many FPGA users. This paper involves using a prototype to evaluate the performance of such an operating system, ReconfigME","PeriodicalId":354940,"journal":{"name":"2006 IEEE International Conference on Field Programmable Technology","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130643777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust and real-time automatic target recognition using partial hausdorff distance measure on reconfigurable hardware","authors":"Jinbo Xu, Y. Dou","doi":"10.1109/FPT.2006.270387","DOIUrl":"https://doi.org/10.1109/FPT.2006.270387","url":null,"abstract":"This paper presents a high performance FPGA-based automatic target recognition system, which matches TV templates with an image efficiently. Theoretically, image matching algorithms based on partial Hausdorff distance (HD) are more tolerant of perturbations in the locations of pixel points than other algorithms, but they are too computationally expensive to be used in embedded systems. In order to solve this problem, we present a robust and real-time implementation of the image matching algorithm based on partial HD, taking advantage of the hardware resources offered by the FPGA chips. A parallel target recognition algorithm under constraints of limited embedded memory and limited memory bandwidth is proposed first. And then the system is organized as a coarse-grained pipeline containing three stages. Each stage is implemented in highly parallel fashion. The implementation of distance transform and template matching are described in detail. Experimental results show that our work outperforms related proposals. A speedup of almost 50 is achieved while compared with the software solution in PC (Pentium 4 2.8 GHz)","PeriodicalId":354940,"journal":{"name":"2006 IEEE International Conference on Field Programmable Technology","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122759026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Memory support design for LU decomposition on the starbridge hyper-computer","authors":"Seth Young, A. Sudarsanam, A. Dasu, T. Hauser","doi":"10.1109/FPT.2006.270307","DOIUrl":"https://doi.org/10.1109/FPT.2006.270307","url":null,"abstract":"LU matrix decomposition is a linear algebra algorithm used to reduce the complexity required to solve a large system of linear equations. Large systems of equations frequently need to be solved in physics, engineering, and computational chemistry. In the hardware implementation of such LU algorithms supporting modules must be included which handle the transfer of memory between the disk and processing nodes. This paper looks at the data transfer hardware which supports an implementation of a block-based LU algorithm on a multi-FPGA system. Preliminary results are provided which show the required areas and latencies of these designs","PeriodicalId":354940,"journal":{"name":"2006 IEEE International Conference on Field Programmable Technology","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127874911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Seed-based genomic sequence comparison using a FPGA/FLASH accelerator","authors":"D. Lavenier, Xinchun Liu, Gilles Georges","doi":"10.1109/FPT.2006.270389","DOIUrl":"https://doi.org/10.1109/FPT.2006.270389","url":null,"abstract":"This paper presents a parallel architecture for computing genomic sequence alignments using seed-based algorithms. Originality comes from the simultaneous use of FPGA components and flash memories. The FPGA technology brings the computer power while the flash memory provides high memory bandwidth able to feed a large array of specific operators. A 64 GBytes flash memory connected to a Xilinx Virtex-2 Pro PCI board has been developed and an array of 160 distance-computation operators have been implemented to perform the first step of seed-based alignment algorithms. Compared to the blast reference software family, we measured a speed-up of 75 on a real intensive genomic sequence comparison application","PeriodicalId":354940,"journal":{"name":"2006 IEEE International Conference on Field Programmable Technology","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125547950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Samuel Bayliss, C. Bouganis, G. Constantinides, W. Luk
{"title":"An FPGA implementation of the simplex algorithm","authors":"Samuel Bayliss, C. Bouganis, G. Constantinides, W. Luk","doi":"10.1109/FPT.2006.270294","DOIUrl":"https://doi.org/10.1109/FPT.2006.270294","url":null,"abstract":"Linear programming is applied to a large variety of scientific computing applications and industrial optimization problems. The Simplex algorithm is widely used for solving linear programs due to its robustness and scalability properties. However, application of the current software implementations of the Simplex algorithm to real-life optimization problems are time consuming when used as the bounding engine within an integer linear programming framework. This work aims to accelerate the Simplex algorithm by proposing a novel parameterizable hardware implementation of the algorithm on an FPGA. Evaluation of the proposed design using real problems demonstrates a speedup of up to 20 times over a highly optimized commercial software implementation running on a 3.4GHz Pentium 4 processor, which is itself 100 times faster than one of the main public domain solvers","PeriodicalId":354940,"journal":{"name":"2006 IEEE International Conference on Field Programmable Technology","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127883077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}