Rico Backasch, G. Hempel, Stefan Werner, Sven Groppe, Thilo Pionteck
{"title":"Identifying homogenous reconfigurable regions in heterogeneous FPGAs for module relocation","authors":"Rico Backasch, G. Hempel, Stefan Werner, Sven Groppe, Thilo Pionteck","doi":"10.1109/ReConFig.2014.7032533","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032533","url":null,"abstract":"Relocation of partial bitstreams is in the focus of researchers for many years. Several design flows for module relocations have been proposed in the past. In general, these design flows start with a manual identification of equally sized and structured partially reconfigurable regions. Due to the increasing heterogeneity and complexity of modern FPGAs such manual approaches become impractical. This work presents an automated approach for identifying suitable regions for relocatable modules. The algorithm identifies the optimal resource pattern that fits most often on the FPGA device for a given resource requirement. Compared to standard methods trying to find again a manually specified fixed resource pattern, the usage of resource requirements as a starting point augments the amount of identically structured regions to be found.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133662153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast and generic hardware architecture for stereo block matching applications on embedded systems","authors":"K. Häublein, M. Reichenbach, D. Fey","doi":"10.1109/ReConFig.2014.7032518","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032518","url":null,"abstract":"Even with the tremendous performance increase of microprocessor architectures in recent years, real time capturing and computing of stereo images remains a challenging task, particularly in the field of embedded image processing. The stereo block matching technique allows hardware designers to parallelize the process of depth map calculation. Additionally, for smart camera designers it is also crucial to adapt hardware architectures for different FPGA platforms, sensor properties, throughput, and accuracy. However, most application specific implementations of this technique are usually fixed to a single camera set up to achieve high frame rates, but lack in flexibility of these properties. A general approach for a stereo block matching model, which is also able to process high resolution images in real time, is still missing. Therefore, we present a new generic VHDL template for fast window based stereo block matching correlation. It is fully scalable in functional parameters like image size, window size, and disparity range. Its streaming character even allows to compute HD images in real time. Also an interface for a flexible PE structure is provided. This enables the hardware designer to apply a custom made cost function, which performs a correlation between the target windows and the reference window. The developer is also able to adapt the model to the available sensor speed and FPGA resource limitations. These features should help designers to find the right trade-off between depth map quality and available hardware resources.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116012734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Memory optimisation for hardware induction of axis-parallel decision tree","authors":"C. Cheng, C. Bouganis","doi":"10.1109/ReConFig.2014.7032538","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032538","url":null,"abstract":"In data mining and machine learning applications, the Decision Tree classifier is widely used as a supervised learning method not only in the form of a stand alone model but also as a part of an ensemble learning technique (i.e. Random Forest). The induction of Decision Trees (i.e. training stage) involves intense memory communication and inherent parallel processing, making an FPGA device a promising platform for accelerating the training process due to high memory bandwidth enabled by the embedded memory blocks in the device. However, peak memory bandwidth is reached when all the channels of the block RAMs on the FPGA are free for concurrent communication, whereas to accommodate large data sets several block RAMs are often combined together making unavailable a number of memory channels. Therefore, efficient use of the embedded memory is critical not only for allowing larger training dataset to be processed on an FPGA but also for making available as many memory channels as possible to the rest of the system. In this work, a data compression scheme is proposed for the training data stored in the embedded memory for improving the memory utilisation of the device, targeting specifically the axis-parallel decision tree classifier. The proposed scheme takes advantage of the nature of the problem of the decision tree induction and improves the memory efficiency of the system without any compromise on the performance of the classifier. It is demonstrated that the scheme can reduce the memory usage by up to 66% for the training datasets under investigation without compromise in training accuracy, while a 28% reduction in training time is achieved due to extra processing power enabled by the additional memory bandwidth.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117080516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Can high-level synthesis compete against a hand-written code in the cryptographic domain? A case study","authors":"Ekawat Homsirikamol, K. Gaj","doi":"10.1109/ReConFig.2014.7032504","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032504","url":null,"abstract":"This paper investigates the state of the current high-level synthesis (HLS) tools by using Xilinx Vivado HLS for designing a cryptographic module based on Advanced Encryption Standard. The obtained results are compared with the results for the hand-written Register-Transfer Level (RTL) VHDL code to determine the suitability of the HLS-based approach for implementing cryptographic algorithms in hardware. Our study has shown that the RTL-based approach still outperforms the HLS-based approach due to the flexibility in designing a control unit, which affects the throughput of the circuit. Nevertheless, the HLS-based approach can successfully compete with the RTL-based approach in terms of area and maximum clock frequency.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132277144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. A. P. Figueiredo, Fabiano S. Mathilde, Fabbryccio A. C. M. Cardoso, Rafael M. Vilela, J. P. Miranda
{"title":"Efficient FPGA-based implementation of a CAZAC sequence generator for 3GPP LTE","authors":"F. A. P. Figueiredo, Fabiano S. Mathilde, Fabbryccio A. C. M. Cardoso, Rafael M. Vilela, J. P. Miranda","doi":"10.1109/ReConFig.2014.7032513","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032513","url":null,"abstract":"This paper presents a configurable and optimized hardware architecture for computing Zadoff-Chu (ZC) complex sequences in the frequency domain. It is a hardware-efficient and accurate architecture for computing ZC sequences in realtime. The architecture is mainly based on the CORDIC algorithm for computing complex exponentials using only shift and add operations. Due to transformations applied to the Zadoff-Chu equation it is possible to eliminate the use of multipliers with non-constant terms. This hardware architecture is employed by the Physical Random Access Channel (PRACH) in LTE and LTE-A systems during the reception and detection of random access preambles. Its main advantage is that it eliminates the need for storing a large number of long complex ZC sequences. Simulation results show that the proposed architecture is accurate, efficient and renders the resulting PRACH receiver fully compliant with 3GPP's detection requirements.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130442580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PoC-align: An open-source alignment accelerator using FPGAs","authors":"Thomas B. Preußer, Oliver Knodel, R. Spallek","doi":"10.1109/ReConFig.2014.7032548","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032548","url":null,"abstract":"The mapping of reads, i.e. short DNA base pair strings, to large genome databases has become a critical operation for genetic analysis and diagnosis. The underlying alignment operation essentially is a string search tolerating some character mismatches and possibly character deletions or insertions with respect to a reference genome. Its output comprises the locations within the reference that are likely to correspond to the mapped DNA snippet. This paper describes PoC-Align, an alignment infrastructure using FPGA accelerators. It is an extension of our preceding FPGA aligner [1], which has been enhanced to tolerate alignment gaps (insertions and deletions) and to be more customizable though generic parameters. In addition to the descriptions of the implementation of these extensions, we also name the mainly software-carried enhancements, such as the support of mapping paired-end reads, that are implemented on top of the FPGA accelerator. Providing a thorough overview on the complete infrastructure, we aim at advertising the disclosure of the sources of our solution and hope to encourage other groups to use and extend this platform.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128662241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A unified OpenCL-flavor programming model with scalable hybrid hardware platform on FPGAs","authors":"Hongyuan Ding, Miaoqing Huang","doi":"10.1109/ReConFig.2014.7032563","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032563","url":null,"abstract":"Hardware accelerators are capable of achieving significant performance improvement. However, designing hardware accelerators lacks the flexibility and the productivity. Combining hardware accelerators with multiprocessor system-on-chip (MPSoC) is an alternative way to balance the flexibility, the productivity, and the performance. In this work, we present a unified hybrid OpenCL-flavor (HOpenCL) parallel programming model on MPSoC supporting both hardware and software kernels. By integrating the HOpenCL hardware IPs and software libraries, the same kernel function can execute as either hardware kernels on the dedicated hardware accelerators or software kernels on the general-purpose processors. Using the automatic design flow, the corresponding hybrid hardware platform is generated along with the executable. We use the matrix multiplication of 512×512 to examine the potential of our hybrid system in terms of performance, scalability, and productivity. The results show that hardware kernels reach more than 10 times speedup compared with the software kernels. Our prototype platform also demonstrates a good performance scalability when the number of group computation units (GCUs) increases from 1 to 6 until it becomes a memory bound problem. Compared with the hard ARM core on the Zynq 7045 device, we find that the performance of one ARM core is equivalent to 2 or 3 GCUs with software kernel implementations. On the other hand, a single GCU with hardware kernel implementation is 5 times faster than the ARM core.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123735720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A hardware generator for factor graph applications","authors":"James Demma, P. Athanas","doi":"10.1109/ReConFig.2014.7032490","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032490","url":null,"abstract":"A Factor Graph (FG-http://en.wikipedia.org/wiki/Factor_graph) is a structure used to And solutions to problems that can be represented as a Probabilistic Graphical Model (PGM). They consist of interconnected variable nodes and factor nodes, which iteratively compute and pass messages to each other. FG's can be applied to solve decoding of forward error correcting codes, Markov chains and Markov Random Fields, Kaiman Filtering, Fourier Transforms, and even some games such as Sudoku. In this paper, a framework is presented for rapid prototyping of hardware implementations of FG-based applications. The FG developer specifies aspects of the application, and the framework returns a design. A system of Python scripts and Verilog Hardware Description Language templates together are used to generate the HDL source code for the application. The generated designs are vendor/platform agnostic, but currently target the Xilinx Virtex-6-based ML605. The framework has so far been primarily applied to construct Low Density Parity Check (LDPC) decoders. The characteristics of a large basket of generated LDPC decoders, including contemporary 802.11η decoders, have been examined as a verification of the system and as a demonstration of its capabilities. As a further demonstration, the framework has been applied to construct a Sudoku solver.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129699751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPGA-based accelerator development for non-engineers","authors":"David Uliana, P. Athanas, Krzysztof Kepa","doi":"10.1109/ReConFig.2014.7032522","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032522","url":null,"abstract":"In todays world of big-data computing, access to massive, complex data sets has reached an unprecedented level, and the task of intelligently processing such data into useful information has become a growing concern to the high-performance computing community. However, domain experts, who are the brains behind this processing, typically lack the skills required to build FPGA-based hardware accelerators ideal for their applications, as traditional development flows targeting such hardware require digital design expertise. This work proposes a usable, end-to-end accelerator development methodology that attempts to bridge this gap between domain-experts and the vast computational capacity of FPGA-based heterogeneous platforms. To accomplish this, a development flow was assembled, targeting the Convey Hybrid-Core HC-1 heterogeneous platform and utilizing an existing graphical design environment for design entry. The efficacy of the flow in extending FPGA-based acceleration to non-engineers in the life sciences was informally tested at an NSF-funded summer workshop, organized and hosted by a bioinformatics organization at a particular university. A group of five life-science-focused, non-engineer participants made significant modifications to a bare-bones Smith-Waterman accelerator, extending its functionality and improving performance.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125911553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Lopez-Ramirez, L. Ledesma-Carrillo, Ana L. Martinez-Herrera, E. Cabal-Yépez, H. Miranda-Vidales
{"title":"FPGA-based reconfigurable unit for real-time power quality index estimation","authors":"M. Lopez-Ramirez, L. Ledesma-Carrillo, Ana L. Martinez-Herrera, E. Cabal-Yépez, H. Miranda-Vidales","doi":"10.1109/ReConFig.2014.7032521","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032521","url":null,"abstract":"Power quality monitoring is an important subject for investigation and research. In this work a generic FPGA-based portable architecture for real-time power quality index (PQI) estimation is proposed. Different from off-the-shelf specialized equipment, the proposed hardware implementation offers higher exactitude for PQI estimation and representation of voltage and current signals. Unlike previous approaches, the proposed FPGA-based PQI computation unit estimates up to fourteen PQI, it is highly portable to different platforms, and it can be implemented on a single chip.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"76 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126130237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}