Moritz Schmid, Nicolas Apelt, Frank Hannig, J. Teich
{"title":"An image processing library for C-based high-level synthesis","authors":"Moritz Schmid, Nicolas Apelt, Frank Hannig, J. Teich","doi":"10.1109/FPL.2014.6927424","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927424","url":null,"abstract":"We introduce a library for the productive development of image processing accelerators using C-based high-level synthesis. The key concept of our approach is to provide a set of generic building blocks that is applicable to a multitude of image processing applications. An efficient memory architecture that facilitates easy integration of point and local image processing operators is the centerpiece of the library. The generic building blocks are kept very compact and can be tailored to support sophisticated processing techniques. The representation enables the designer to comply with specific design requirements, such as stringent timing constraints or limited resource budgets. Results show a significant gain in productivity compared to hand coded implementation while delivering comparable performance and resource requirements.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126911097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Methods for implementation of feedback loops in high speed FPGA applications","authors":"Nima Safari, V. Mauer, S. Gheitanchi","doi":"10.1109/FPL.2014.6927434","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927434","url":null,"abstract":"In many Digital Signal Processing (DSP) modules, increasing the number of pipelining stages to achieve higher throughput may break the module functionality if a feedback-loop exists in the algorithm. This paper addresses a novel algorithmic-level technique to modify implementation of feedback loops to allow deeper pipelining while sustaining the module functionality. An equivalent model for a first-order Infinite Impulse Response (IIR) filter can be obtained by a cascade model including a higher order repeated-pole IIR filter followed by a Finite Impulse Response (FIR) filter. The order of the repeated-pole IIR filters, and hence the number of pipelining stages can be chosen to meet the Fmax requirements. The model is further developed to include a class of mathematical recursive functions to cover many different DSP applications.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128137063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient implementation of a single-precision floating-point arithmetic unit on FPGA","authors":"Wilson Jose, Ana Rita Silva, H. Neto, M. Véstias","doi":"10.1109/FPL.2014.6927391","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927391","url":null,"abstract":"This paper presents a single precision floating point arithmetic unit with support for multiplication, addition, fused multiply-add, reciprocal, square-root and inverse square-root with high-performance and low resource usage. The design uses a piecewise 2nd order polynomial approximation to implement reciprocal, square-root and inverse square-root. The unit can be configured with any number of operations and is capable to calculate any function with a throughput of one operation per cycle. The floating-point multiplier of the unit is also used to implement the polynomial approximation and the fused multiply-add operation. We have compared our implementation with other state-of-the-art proposals, including the Xilinx Core-Gen operators, and conclude that the approach has a high relative performance/area efficiency.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132729437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Asynchronously assisted FPGA for variability","authors":"H. S. Low, D. Shang, Fei Xia, A. Yakovlev","doi":"10.1109/FPL.2014.6927398","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927398","url":null,"abstract":"The effect of variability has become increasingly significant as a result of technology geometry scaling. This paper describes Asynchronous Assisting Logic (AAL) blocks and the method of introducing them into modern FPGA architecture, in order to increase tolerance of the wide range latency variations caused by parametric variation, and temperature and supply voltage fluctuations. The proposed method leverages the availability of variation maps and suggests deploying configurable AAL blocks only into the variation critical paths - reinforcing rather rerouting/remapping. This method reduces the size overhead significantly which normally will be incurred by fully asynchronous designs. The proposed technique maintains the existing FPGA architecture allowing potential reuse of design flow. Simulations show correct functionality given regularly variable, randomly variable and capacitor switching energy harvester voltage supplies.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133252954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Amouri, Florent Bruguier, S. Kiamehr, P. Benoit, L. Torres, M. Tahoori
{"title":"Aging effects in FPGAs: an experimental analysis","authors":"A. Amouri, Florent Bruguier, S. Kiamehr, P. Benoit, L. Torres, M. Tahoori","doi":"10.1109/FPL.2014.6927390","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927390","url":null,"abstract":"Modern Field Programmable Gate Arrays (FPGAs) are built using the most advanced technology nodes to meet performance and power demands. This makes them susceptible to various reliability challenges at nano-scale, and in particular to transistor aging. In this paper, an experimental analysis is made to identify the main parameters and phenomena influencing the performance degradation of FPGAs. For that purpose, a set of controlled ring-oscillator-based sensors with different frequencies and tunable activity control are implemented on a Spartan-6 FPGA. Thus, the internal switching activities (SAs) and signal probabilities (SPs) of the sensors can be varied. We performed accelerated-lifetime conditions using elevated temperatures and voltages in a controlled setting to stress the FPGA. A novel monitoring method based on measuring the electromagnetic emissions of the FPGA is used to accurately monitor the performance of the sensors before and after the stress. The experiments reveal the extent of performance degradations, the impact of SPs and SAs, and the relative impacts of BTI and HCI aging factors.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129342987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPGA architecture support for heterogeneous, relocatable partial bitstreams","authors":"Christophe Huriaux, O. Sentieys, R. Tessier","doi":"10.1109/FPL.2014.6927494","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927494","url":null,"abstract":"The use of partial dynamic reconfiguration in FPGA-based systems has grown in recent years as the spectrum of applications which use this feature has increased. For these systems, it is desirable to create a series of partial bitstreams which represent tasks which can be located in multiple regions in the FPGA fabric. While the transferal of homogeneous collections of lookup-table based logic blocks from region to region has been shown to be relatively straightforward, it is more difficult to transfer partial bitstreams which contain fixed-function resources, such as block RAMs and DSP blocks. In this paper we consider FPGA architecture enhancements which allow for the migration of partial bitstreams including fixed-function resources from region to region even if these resources are not located in the same position in each region. Our approach does not require significant, time-consuming place-and-route during the migration process. We quantify the cost of inserting additional routing resources into the FPGA architecture to allow for easy migration of heterogeneous, fixed-function resources. Our experiments show that this flexibility can be added for a relatively low overhead and performance penalty.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"293 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115079021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mixed-architecture process scheduling on tightly coupled reconfigurable computers","authors":"B. K. Hamilton, M. Inggs, Hayden Kwok-Hay So","doi":"10.1109/FPL.2014.6927421","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927421","url":null,"abstract":"The design and implementation of a multitasking runtime system for mixed-architecture applications on a tightly coupled FPGA-CPU platform is presented. The runtime environment and the user applications assume an underlying machine that encompasses multiple computing architectures within a unified machine model. Using this model, a unified process scheduling mechanism was developed that enables concurrent execution of multiple mixed-architecture processes. Scheduling and allocation strategies, including blocking and preemption, were implemented and evaluated with respect to performance and fairness on a Xilinx Zynq platform using a mix of synthetic workloads.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124438143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junyi Liu, Helfried Peyrl, A. Burg, G. Constantinides
{"title":"FPGA implementation of an interior point method for high-speed model predictive control","authors":"Junyi Liu, Helfried Peyrl, A. Burg, G. Constantinides","doi":"10.1109/FPL.2014.6927473","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927473","url":null,"abstract":"In this paper, we present a hardware architecture for implementing an interior point method for model predictive control (MPC) on field programmable gate arrays (FPGA). The FPGA implementation allows the solution of quadratic programs occurring in MPC at very high speed. Experiments show that our hardware implementation is able to outperform an software implementation running on a high-end CPU while consuming significantly less power making it well-suited for embedded industrial control applications. In contrast to existing FPGA implementations, the proposed solution exploits the MPC-specific problem structure with the direct linear equation solver and uses an efficient predictor-corrector algorithm. Moreover, the modular design of the architecture simplifies customization or extension to special control problem classes. The proposed FPGA solution can broaden the applicability of solving complex or large MPC problems in embedded computing platforms that were so far considered out of reach.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128810611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient mapping of mathematical expressions into DSP blocks","authors":"Bajaj Ronak, Suhaib A. Fahmy","doi":"10.1109/FPL.2014.6927419","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927419","url":null,"abstract":"Mapping complex mathematical expressions to DSP blocks through standard inference from pipelined code is inefficient and results in significantly reduced throughput. In this paper, we demonstrate the benefit of considering the structure and pipeline arrangement of DSP blocks during mapping. We have developed a tool that can map mathematical expressions using RTL inference, through high level synthesis with Vivado HLS, and through a custom approach that incorporates DSP block structure. We can show that the proposed method results in circuits that run at around double the frequency of other methods, demonstrating that the structure of the DSP block must be considered when scheduling complex expressions.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125347784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Wonneberger, Max Kohler, W. Derendarz, T. Graf, Rolf Ernst
{"title":"Efficient 3D triangulation in hardware for dense structure-from-motion in low-speed automotive scenarios","authors":"S. Wonneberger, Max Kohler, W. Derendarz, T. Graf, Rolf Ernst","doi":"10.1109/FPL.2014.6927465","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927465","url":null,"abstract":"With the introduction of surround view cameras in modern vehicles and the possibility of calculating dense motion fields in real-time from a moving camera a detailed 3D reconstruction of the static environment is possible (structure-from-motion). Beside the necessity of a motion field between two image frames the task of triangulating those individual 2D point matches to 3D points in the world becomes non real-time on modern CPUs when to be repeated for all image points. In this work we evaluate different approaches to the 3D triangulation optimization problem in a typical structure-from-motion processing chain for an efficient implementation in hardware. An architecture for solving this problem using linear triangulation with an inhomogeneous solution to the equation system is proposed. We evaluate our implementation using FPGAs against a software-implementation with synthetic datasets and from low-speed parking area scenes for numerical accuracy and real-time capabilities. In addition the proposed fixed-point arithmetic implementation is compared against an implementation using floating-point units.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125903201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}