R. Vaslin, G. Gogniat, J. Diguet, R. Tessier, D. Unnikrishnan, K. Gaj
{"title":"Memory security management for reconfigurable embedded systems","authors":"R. Vaslin, G. Gogniat, J. Diguet, R. Tessier, D. Unnikrishnan, K. Gaj","doi":"10.1109/FPT.2008.4762378","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762378","url":null,"abstract":"The constrained operating environments of many FPGA-based embedded systems require flexible security that can be configured to minimize the impact on FPGA area and power consumption. In this paper, a security approach for external memory in FPGA-based embedded systems that exploits FPGA configurability is presented. Our FPGA-based security core provides both confidentiality and integrity for data stored externally to an FPGA which is accessed by a processor on the FPGA chip. The benefits of our security core are demonstrated using four embedded applications implemented on a Stratix II device. Each application requires a collection of tasks with varying memory security requirements. Our security core is used in conjunction with a NIOS II soft processor running the MicroC/OS II operating system. An average memory and energy savings of about 64%and 16%, respectively, is achieved for the four applications versus a non-configurable, uniform security approach.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124174618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Defining neighborhood relations for fast spatial-temporal partitioning of applications on reconfigurable architectures","authors":"J. Sim, T. Mitra, W. Wong","doi":"10.1109/FPT.2008.4762374","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762374","url":null,"abstract":"Considering both spatial and temporal partitioning, though potentially profitable, increases the complexity of the design space of applications for run-time reconfigurable architectures. In particular, the number of ways to partition is exponential and dynamic reconfiguration cost is difficult to estimate. These difficulties are particularly challenging for the implementation of neighborhood searches over the design space, such as the sheer amount of design space to be searched and time taken to evaluate each design point accurately. In order to address these challenges, this paper presents a framework that enables fast navigation of the design space using any neighborhood search schemes. The key is a neighborhood relation which spans the entire spatial and temporal partitioning design space. Computed over a SEQUITUR compressed loop trace structure, this relation enables the fast estimation of neighboring design points. We implemented two neighborhood searches, Hill-climb and tabu search, to evaluate our technique. On four non-trivial benchmarks, these searches are accelerated by up to two orders of magnitude when using our proposed technique while finding optimal results most of the time.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114690260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hardware acceleration of approximate palindromes searching","authors":"Tomáš Martínek, M. Lexa","doi":"10.1109/FPT.2008.4762367","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762367","url":null,"abstract":"Understanding the structure and function of DNA sequences represents an important area of research in modern biology. Unfortunately, analysis of such data is often complicated by the presence of mutations introduced by evolutionary processes. They increase the time-complexity of algorithms for sequence analysis by introducing an element of uncertainty, complicating their practical usage. One class of such algorithms has been designed to search for palindromes with possible errors-approximate palindromes. The best state-of-the-art methods implemented in software show time-complexity between linear and quadratic, depending on required input parameters. This paper investigates the possibilities for hardware acceleration of approximate palindrome searching and describes a parametrized architecture suitable for chips with FPGA technology. A prototype of the proposed architecture was implemented in VHDL language and synthesized for virtex technology. Application on test sequences shows that the circuit is able to speed up palindrome searching by up to 8000times in comparison with the best-known software method relying on suffix arrays.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122074363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring hard and soft networks-on-chip for FPGAs","authors":"Rosemary M. Francis, S. Moore","doi":"10.1109/FPT.2008.4762393","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762393","url":null,"abstract":"We present an FPGA architecture with Time Division Multiplexed (TDM) wiring with hard network routers and use this architecture to implement a circuit switched Network-on-Chip. We compare this network to exiting approaches: either hard or soft implementations of the network on an FPGA. TDM wiring allows us to address the problem of interfacing high-speed hard-routers with slower soft cores. The router area is reduced in favour of more flexible TDM wiring components. Our approach is more power and area efficient than soft networks and more flexible than hard networks.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124860463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Delay evaluation of 90nm CMOS multi-context FPGA with shift-register-type temporal communication module for large-scale circuit emulation","authors":"N. Miyamoto, T. Ohmi","doi":"10.1109/FPT.2008.4762419","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762419","url":null,"abstract":"For large-scale circuit emulation with using a multi-context FPGA (MC-FPGA), a circuit is divided into multiple sub-circuits, each sub-circuit is assigned to a context., and the MC-FPGA sequentially executes all the contexts one by one. So, the total execution delay is the sum of the delays of the contexts. It is, therefore, said that the total execution delay of the MC-FPGA increases proportional to the number of contexts used. However, in this paper, we show that the total execution delay remains constant if a shift-register-type temporal communication module (SRTCM) is used instead of D-flipflop (D-FF) to implement sequential circuits. The SR-TCM is used not only for signal communication of sequential circuit like D-FF, but also for signal communication from preceding context to succeeding contexts. In order to quantify the delay, a MC-FPGA named flexible processor (FP), which contains the SR-TCM, have been designed and fabricated in 90 nm CMOS process technology. From the measurement results, the total execution delay of the FP was kept constant regardless of the number of contexts used.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128314175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Concurrent timing based and routability driven depopulation technique for FPGA packing","authors":"Audip Pandit, Lakshmi Easwaran, A. Akoglu","doi":"10.1109/FPT.2008.4762409","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762409","url":null,"abstract":"In FPGA CAD flow, routability driven algorithms have been introduced to improve feasibility of mapping designs onto the underlying architecture; timing and power driven algorithms have been introduced to meet design specifications. A number of techniques have been proposed to tackle routability, timing or power objectives independently during clustering stage. However, there is minimal work that targets multiple optimization goals. In this paper, we evaluate a clustering technique that targets routability and timing goals simultaneously. We combine the timing-driven T-VPack algorithm with a routability-driven non-uniform depopulation scheme (T-RDPack). Our technique keeps clusters on the critical path fully populated, while depopulating other clusters in the design. This approach has been implemented into the versatile place and route (VPR) toolset. We show that, compared to T-VPack, channel width reductions of 11.5%, 19.1%, 24.7% are achieved while incurring an area overhead of 0.6%, 3.1%, 9.1% respectively with negligible increase in critical path delay, exceeding the performance of T-RPack.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129005577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modelling and compensating for clock skew variability in FPGAs","authors":"N. P. Sedcole, Justin S. J. Wong, P. Cheung","doi":"10.1109/FPT.2008.4762386","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762386","url":null,"abstract":"As integrated circuits are scaled down it becomes difficult to maintain uniformity in process parameters across each individual die. To avoid significant performance loss through pessimistic over-design new design strategies are required that are cognisant of within-die performance variability. This paper examines the effect of process variability on the clock resources in FPGA devices. A model of variation in clock skew in FPGA clock networks is presented. Techniques for reducing the impact of variations on the performance of implemented designs are proposed and analysed, demonstrating that skew variation can be reduced by 70% or more through a combination of phase adjustment and clock rerouting. Measurements on a Virtex-5 FPGA validate the feasibility and benefits of the proposed compensation strategies.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"270 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122946839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reconfigurable array for transcendental functions calculation","authors":"M. Sima, M. McGuire, Scott Miller","doi":"10.1109/FPT.2008.4762365","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762365","url":null,"abstract":"Expanding transcendental functions in a series of Shift-and-Add operations is an alternative to Taylor or Chebyshev series expansions when fixed-point arithmetic with reduced wordlength is required. Typically, reconfigurable arrays do not provide architectural support for shift operations. Instead, shift operations are emulated by either multiplexing logic or multiplication by a power of 2. In this paper we describe the architecture of a reconfigurable array that can natively support shift operations. Rather than augmenting the reconfigurable fabric with dedicated shift units, the interconnection network is extended with shift capabilities. This is conceptually possible since a shift operation is a rearrangement and not a combination of the signals. Layers of computing tiles supporting Shift-and-Add/Subtract and Add-and-Select operations are interleaved with interconnect layers. On such a reconfigurable array, a variety of transcendental functions can be efficiently implemented.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115852136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-time FPGA architecture of extended linear convolution for digital image scaling","authors":"Chung-Chi Lin, M. Sheu, H. Chiang, W. Tsai, Z. Wu","doi":"10.1109/FPT.2008.4762423","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762423","url":null,"abstract":"This paper presents a novel image interpolation method, extended linear interpolation, which is a low-cost architecture with the interpolation quality compatible to that of bi-cubic convolution interpolation. The architecture of reducing the computational complexity of generating weighting coefficients is proposed. Our proposed method provides a simple hardware architecture design, low computation cost and is easy to implement. Compared to the latest bi-cubic hardware design work, the architecture saves about 60% of hardware cost. The presented architecture is implemented on the Virtex-II FPGA has been successfully designed and implemented. The simulation results demonstrate that the high performance architecture of extended linear interpolation at 104 MHz with 379 LBs is able to process digital image scaling.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122903118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient FPGA elliptic curve cryptographic processor over GF(2m)","authors":"S. Antão, R. Chaves, L. Sousa","doi":"10.1109/FPT.2008.4762417","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762417","url":null,"abstract":"In this paper a processor that supports elliptic curve cryptographic applications over GF (2m) is proposed. The proposed structure is capable of calculating point multiplication and addition using a single coordinate to contain the point information. This compression allows for a better usage of the bandwidth resources. For the point multiplication procedure, all coordinate pre-calculations are completely avoided. This design was successful prototyped on a reconfigurable device for the field GF (2163). Experimental results suggest that point multiplication can be performed in 144 mus and point affine addition in 1.02 mus. Comparing with the related work, a 5 times speedup is obtained for point addition and multiplication. The presented design offers a well balanced area-time performance when compared with existent elliptic curve point multiplication specific processors.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122833176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}