{"title":"Effective FPGA debug for high-level synthesis generated circuits","authors":"Jeffrey B. Goeders, S. Wilton","doi":"10.1109/FPL.2014.6927498","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927498","url":null,"abstract":"High-level synthesis (HLS) promises to increase designer productivity in the face of steadily increasing FPGA sizes, and broaden the market of use, allowing software designers to reap the benefits of hardware implementation. One roadblock to HLS adoption is the lack of a debugging infrastructure. To debug, designers can run their source code on a processor; however, this does not capture interactions with other system components. The alternative is to debug using the RTL, which is beyond the expertise of software designers, and impractical for hardware designers as the RTL may not resemble the original source code.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134166770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A high performance alternating projections image demosaicing hardware","authors":"Hasan Azgin, Serkan Yaliman, Ilker Hamzaoglu","doi":"10.1109/FPL.2014.6927412","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927412","url":null,"abstract":"Since capturing three color channels (red, green, and blue) per pixel increases the cost of digital cameras, most digital cameras capture only one color channel per pixel using a single image sensor. The images pass through a color filter array before being captured by the image sensor. Demosaicing is the process of reconstructing the missing color channels of the pixels in the color filtered image using their available neighboring pixels. Alternating Projections (AP) is one of the highest quality image demosaicing algorithms, and it has very high computational complexity. Therefore, in this paper, a high performance AP image demosaicing hardware is proposed. This is the first AP image demosaicing hardware in the literature. The proposed hardware is implemented using Verilog HDL. The Verilog RTL code is verified to work correctly in a Xilinx Virtex 6 FPGA. The FPGA implementation can process 31 full HD (1920×1080) images per second.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132885249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accurate power control and monitoring in ZYNQ boards","authors":"A. Beldachi, J. Núñez-Yáñez","doi":"10.1109/FPL.2014.6927415","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927415","url":null,"abstract":"ZYNQ devices combine a dual-core ARM Cortex A9 processor and a FPGA fabric in the same die and in different power domains. In this paper we investigate the run-time power scaling capabilities of these devices using of-the-shelf boards and proposed accurate and fine-grained power control and monitoring techniques. The experimental results show that both software and hardware methods are possible and the right selection can yield different results in terms of control and monitoring speeds, accuracy of measurement, power consumption, and area overhead. The results also demonstrate that significant power margins are available in the FPGA device with different voltage configurations possible. This can be used to complement traditional voltage scaling techniques applied to the processor domain to obtain hybrid energy proportional computing platforms.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131029311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effective emulation of permanent faults in ASICs through dynamically reconfigurable FPGAs","authors":"E. Sánchez, L. Sterpone, Anees Ullah","doi":"10.1109/FPL.2014.6927478","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927478","url":null,"abstract":"Hardware fault emulation for Application Specific Integrated Circuits (ASICs) on FPGAs can considerably reduce the time required for the fault simulation. This paper presents a methodology to emulate ASIC faults on state-of-the-art FPGAs. The fault emulation is achieved by following a fully automated process consisting of: constrained technology mapping of ASIC net-list; creation of fault dictionary, generation of faulty partial bit-streams and fault emulation. The proposed approach exploits run-time partial reconfiguration techniques for fault injection and avoids full net-list re-compilations. The method's feasibility is assessed through carefully selected circuits and overhead in terms of area and timing is reported.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133605902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Particle filtering-based Maximum Likelihood Estimation for financial parameter estimation","authors":"Jinzhe Yang, Binghuan Lin, W. Luk, Terence Nahar","doi":"10.1109/FPL.2014.6927411","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927411","url":null,"abstract":"This paper presents a novel method for estimating parameters of financial models with jump diffusions. It is a Particle Filter based Maximum Likelihood Estimation process, which uses particle streams to enable efficient evaluation of constraints and weights. We also provide a CPU-FPGA collaborative design for parameter estimation of Stochastic Volatility with Correlated and Contemporaneous Jumps model as a case study. The result is evaluated by comparing with a CPU and a cloud computing platform. We show 14 times speed up for the FPGA design compared with the CPU, and similar speedup but better convergence compared with an alternative parallelisation scheme using Techila Middleware on a multi-CPU environment.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"508 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115887070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Criticality-aware scrubbing mechanism for SRAM-based FPGAs","authors":"Rui Santos, Shyamsundar Venkataraman, Anup Das, Akash Kumar","doi":"10.1109/FPL.2014.6927476","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927476","url":null,"abstract":"Scrubbing has been considered as an effective mechanism to provide fault-tolerance in Static-RAM (SRAM)-based Field Programmable Gate Arrays (FPGAs). However, the current scrubbing techniques execute without considering the criticality and timing of the user tasks implemented in the FPGA. They often do not execute the scrubbing process in the right instant, which minimizes the probability of each task being executed without transient faults. Moreover, these current solutions are not adapted to the tasks' fault-tolerance requirements, since they may not properly protect the most critical tasks in the system. However, if they do it, they waste resources with the less critical tasks. In this paper, a new scrubbing mechanism is proposed. This new approach adapts the scrubbing mechanism to the tasks' execution, by a proper scheduling and according to their criticality. A proposed heuristic finds a feasible scrubbing schedule for each hardware task. Firstly, the minimum scrubbing periods are computed according to the criticality of each implemented hardware task. Secondly, a proper scrubbing schedule following the EDL (Earliest Deadline as Late as possible) algorithm is found, maximizing the reliability of the system. The experimental results show up to 79% improvements on the system reliability, achieved without wasting scrubbing resources.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116286730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gorker Alp Malazgirt, Hasan Erdem Yantır, A. Yurdakul, S. Niar
{"title":"Application specific multi-port memory customization in FPGAs","authors":"Gorker Alp Malazgirt, Hasan Erdem Yantır, A. Yurdakul, S. Niar","doi":"10.1109/FPL.2014.6927426","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927426","url":null,"abstract":"FPGA block RAMs (BRAMs) offer speed advantages compared to LUT-based memory designs but a BRAM has only one read and one write port. Designers need to use multiple BRAMs in order to create multi-port memory structures which are more difficult than designing with LUT-based multiport memories. Multi-port memory designs increase overall performance but comes with area cost. In this paper, we present a fully automated methodology that tailors our multi-port memory from a given application. We present our performance improvements and area tradeoffs on state-of-the-art string matching algorithms.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"52 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129394619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPGA acceleration of SAT/Max-SAT solving using variable-way cache","authors":"K. Kanazawa, T. Maruyama","doi":"10.1109/FPL.2014.6927405","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927405","url":null,"abstract":"WalkSAT (WSAT) is a stochastic local search algorithms for Boolean Satisfiability (SAT) and Maximum Boolean Satisfiability (MaxSAT) problems, and it is very suitable for hardware acceleration because of its high inherent parallelism. Formal verification is one of the most important applications of SAT and MaxSAT, however, the size of the formal verification problems is significantly larger than on-chip memory size, and most of the data have to be placed in off-chip DRAM. In this paper, we propose a method to hide the access delay by using on-chip memory banks as a variable-way associative cache memory. The size of data blocks that are frequently fetched from the DRAM considerably varies in the WSAT algorithm. This cache memory aims to hold whole block when it is small enough, and only the head portion when it is large, to hide the DRAM access delay. With this cache memory, up to 60% DRAM access delay can be hidden, and the performance can be improved up to 26%.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128779375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cost-efficient FPGA layered LDPC decoder with serial AP-LLR processing","authors":"O. Boncalo, A. Amaricai, A. Hera, V. Savin","doi":"10.1109/FPL.2014.6927474","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927474","url":null,"abstract":"This paper proposes an FPGA based layered architecture for quasi-cyclic (QC) irregular LDPC decoder. Our approach is based on merging variable and check node processing into one single variable-check node (VCN) unit. Layer message computation is done using a parallel scheme of a number of VCNs equal to the expansion factor of the QC matrix. The proposed architecture is characterized by the serial processing of the a posteriori LLRs by an FPGA specific high frequency VCN unit implementation using ROM memories. In our approach data conversions as well as additions and comparators are replaced by look-up-tables implemented using distributed RAM. In addition to this, other techniques such as: efficient packaging of LLRs messages and check-node message compression as well as the configurable port width of the FPGA's BRAM are used to reduce BRAM block utilization. Throughput increase is achieved by utilizing techniques such as pipelining, parallel processing of multiple VCNs, as well as relatively high working frequency. Implementation results for the WiMAX (1152, 2304) QC irregular LDPC code indicate that the proposed architecture has up to 3x less slices resource utilization and up to 1 order of magnitude less BRAM blocks with respect to other approaches, while maintaining a throughput of several hundreds of Mbps (800 Mbps coded bits). We achieved this without sacrificing flexibility; therefore we can easily adapt our design to accommodate different code rates.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125217956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hardware conversion of neural networks simulation models for neural processing accelerator implemented as FPGA-based SoC","authors":"M. Pietras","doi":"10.1109/FPL.2014.6927383","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927383","url":null,"abstract":"The transition from a neural network simulation model to its hardware representation is a complex process, which touches computations precision, performance and effective architecture implementation issues. Presented neural processing accelerator involves neural network sectioning, precision reduction and weight coefficients parsing (arrangements) in order to increase efficiency and maximize FPGA hardware resources utilization. Particular attention has been devoted on to ANN conversion methods designed for a system based on neural processing units and related with this process redundant calculations and empty neurons generation. In addition, this paper describes the FPGA-based Neural Processing Accelerator architecture benchmark for real example implementation of a pattern recognition neural network.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131226458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}