{"title":"An implementation of Blokus Duo player on FPGA","authors":"A. Kojima","doi":"10.1109/FPT.2013.6718429","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718429","url":null,"abstract":"Blokus Duo is a board game for two players, which uses 21 different shapes of tiles and 14×14 board. We implement Blokus Duo player design on FPGA by using hardware-software co-design method. In this paper, we describe the structure and the algorithm which are used in our Blokus Duo player design. It uses alpha-beta pruning and iterative deepening depth-first search by software and evaluation function by hardware in the current version. The hardware-software co-design implementation is about seven times faster than the original software code.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133510150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Yoza, R. Moriwaki, Yuki Torigai, Yuki Kamikubo, Takayuki Kubota, Takahiro Watanabe, Takumi Fujimori, Hiroyuki Ito, Masato Seo, Kouta Akagi, Y. Yamaji, Minoru Watanabe
{"title":"FPGA Blokus Duo Solver using a massively parallel architecture","authors":"T. Yoza, R. Moriwaki, Yuki Torigai, Yuki Kamikubo, Takayuki Kubota, Takahiro Watanabe, Takumi Fujimori, Hiroyuki Ito, Masato Seo, Kouta Akagi, Y. Yamaji, Minoru Watanabe","doi":"10.1109/FPT.2013.6718426","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718426","url":null,"abstract":"Recently, many game programs have been developed aggressively as hardware on field programmable gate arrays (FPGAs) because of the extremely large solution space of such games as the Connect6 game, Blokus Duo game, and others so that the computational capabilities of computers are currently insufficient to search all possible solutions. This report describes an FPGA acceleration experiment for the Blokus Duo game. The FPGA Blokus Duo Solver was implemented on an Arria II GX FPGA (Altera Corp.). Its operation speed is 25 times faster than C++ based software operation of the same algorithm on a Core i7 processor.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124157643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An FPGA-cluster-accelerated match engine for content-based image retrieval","authors":"Chen Liang, Chen-Mie Wu, Xuegong Zhou, Wei Cao, Shengye Wang, Lingli Wang","doi":"10.1109/FPT.2013.6718404","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718404","url":null,"abstract":"In this paper, a high-performance match engine for content-based image retrieval is proposed. Highly customized floating-point(FP) units are designed, to provide the dynamic range and precision of standard FP units, but with considerably less area than standard FP units. Match calculation arrays with various architectures and scales are designed and evaluated. An CBIR system is built on a 12-FPGA cluster. Inter-FPGA connections are based on standard 10-Gigabyte Ethernet. The whole FPGA cluster can compare a query image against 150 million library images within 10 seconds, basing on detailed local features. Compared with the Intel Xeon 5650 server based solution, our implementation is 11.35 times faster and 34.81 times more power efficient.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114168258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nathan Sandoval, Casey Mackin, Sean Whitsitt, Roman L. Lysecky, J. Sprinkle
{"title":"Runtime hardware/software task transition scheduling for data-adaptable embedded systems","authors":"Nathan Sandoval, Casey Mackin, Sean Whitsitt, Roman L. Lysecky, J. Sprinkle","doi":"10.1109/FPT.2013.6718382","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718382","url":null,"abstract":"Data-adaptable reconfigurable embedded systems enable a flexible runtime implementation in which a system can transition the execution of tasks between hardware and software while simultaneously continuing to process data during the transition. Efficient runtime scheduling of task transitions is needed to optimize system throughput and latency of the reconfiguration and transition periods. In this paper, we present and analyze several runtime transition scheduling algorithms and highlight the latency and throughput tradeoffs for an example system.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125910746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Nah, Jun Lee, Hongjune Kim, Jinseok Lee, S. Hwang, Donghoon Yoo, Jaejin Lee
{"title":"An OpenCL optimizing compiler for reconfigurable processors","authors":"J. Nah, Jun Lee, Hongjune Kim, Jinseok Lee, S. Hwang, Donghoon Yoo, Jaejin Lee","doi":"10.1109/FPT.2013.6718351","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718351","url":null,"abstract":"This paper presents simple and efficient optimization techniques for an OpenCL compiler that targets reconfigurable processors. The target architecture consists of a generalpurpose processor core and an embedded reconfigurable accelerator with vector units. The accelerator is able to switch its architecture between the VLIW mode and the Coarse Grained Reconfigurable Array (CGRA) mode to achieve high performance. One big problem of this architecture is programming difficulty and OpenCL can be a good solution. However, since OpenCL does not guarantee performance portability, hardware dependent optimization is still necessary. Hence, we develop an OpenCL compiler framework that exploits the mode switching capability and vector units. To measure the effectiveness of the techniques, we have implemented the OpenCL framework and evaluate their performance with fourteen OpenCL benchmark applications.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126863342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Altman, Joshua S. Auerbach, D. F. Bacon, Ioana Baldini, P. Cheng, Stephen J. Fink, R. Rabbah
{"title":"The Liquid Metal Blokus Duo Design","authors":"E. Altman, Joshua S. Auerbach, D. F. Bacon, Ioana Baldini, P. Cheng, Stephen J. Fink, R. Rabbah","doi":"10.1109/FPT.2013.6718425","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718425","url":null,"abstract":"This paper describes the Liquid Metal entry in the 2013 ICFPT Design Competition. The Liquid Metal system provides a high-level language called Lime and a toolchain targeting FPGAs. Lime allowed us to use standard software development processes for programming, debugging, and performance tuning our FPGA design. We believe such iteration and refinement are far more challenging with low-level languages and design tools commonly used for FPGA development.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125144656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An automated flow for the High Level Synthesis of coarse grained parallel applications","authors":"Vito Giovanni Castellana, Fabrizio Ferrandi","doi":"10.1109/FPT.2013.6718370","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718370","url":null,"abstract":"High Level Synthesis (HLS) provides a way to significantly enhance the productivity of embedded system designers, by enabling the automatic or semiautomatic generation of hardware accelerators starting from high level descriptions with (usually software) programming languages. Typical HLS approaches build a centralized Finite State Machine (FSM) to control the generated datapath, performing the operations according to a pre-determined, static schedule. However, FSM-based approaches are only able to extract parallelism within a single execution flow. In the presence of coarse grained parallelism, in the form of concurrent function calls or parallel control structures, they either serialize all the operations, or build excessively complex controllers, aiming at executing as many operation as possible in a single control step (i.e., they try to extract as much instruction level parallelism as possible). The resulting controllers occupy an excessive amount of area or lead to very low operating frequencies. In this paper we propose a methodology for the HLS of accelerators supporting parallel execution and dynamic scheduling. The approach exploits an adaptive distributed controller, composed of a set of communicating elements associated with each operation. This controller design enables supporting multiple concurrent execution flows, thus increasing parallelism exploitation beyond instruction level parallelism. The approach also supports variable latency operations, such as memory accesses and speculative operations. We apply our methodology on a set of typical HLS benchmarks, and demonstrate valuable speed ups with limited area overheads with respect to conventional FSM-based flows.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122038657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Johannes Maximilian Kühn, Thomas Schweizer, Dustin Peterson, T. Kuhn, W. Rosenstiel
{"title":"Testing reliability techniques for SoCs with fault tolerant CGRA by using live FPGA fault injection","authors":"Johannes Maximilian Kühn, Thomas Schweizer, Dustin Peterson, T. Kuhn, W. Rosenstiel","doi":"10.1109/FPT.2013.6718415","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718415","url":null,"abstract":"In this work, we intend to demonstrate a number of reliability techniques developed for Coarse Grained Reconfigurable Architectures (CGRA). The techniques to be demonstrated target different portions of a System on Chip (SoC) Design consisting of a general purpose CPU, various accelerators and a CGRA which may be used for application acceleration as well. On the CGRA we will demonstrate a light-weight Triple Modular Redundancy (TMR) technique which mitigates the hardware overhead usually incurred by TMR. In case of a detected CGRA fault, we use Dynamic Remapping of the application to avoid faulty components and thus restore the functionality of the mapped application. On SoC level, we demonstrate Dynamic Functional Verification to sample and thus detect faults in components of the SoC in a time multiplexed manner. The complete system is emulated on a Field Programmable Gate Array (FPGA) for which we developed a fast and accurate fault injection method to test the developed techniques in a live and realistic way.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128362569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An open-source SATA core for Virtex-4 FPGAs","authors":"Cory Gorman, P. Siqueira, R. Tessier","doi":"10.1109/FPT.2013.6718413","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718413","url":null,"abstract":"In this demonstration, we present an open-source Serial ATA core designed for Virtex-4 FPGAs. This core utilizes the RocketIO Multi-Gigabit Transceiver (MGT) of the Virtex-4 to interface with hard drives at SATA Generation 1 (SATA I, 1.5 Gb/s) and Generation 2 (SATA II, 3.0 Gb/s) speeds. A full design hierarchy from host software to the physical layer is provided with the distribution to facilitate design use. A simple, FIFO interface allows for easy integration with other FPGA modules. The demonstration illustrates the correct write and read behavior of the core using a Xilinx ML405 board and a solid state disk. The peak transfer rate of the core for SATA I (130 MB/s) is demonstrated. Our goal for the demonstration is to educate the reconfigurable computing community regarding the availability of the core and to illustrate its capabilities.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131081761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From software threads to parallel hardware in high-level synthesis for FPGAs","authors":"Jongsok Choi, S. Brown, J. Anderson","doi":"10.1109/FPT.2013.6718365","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718365","url":null,"abstract":"We describe the support within high-level hardware synthesis (HLS) for two standard software parallelization paradigms: Pthreads and OpenMP. Parallel code segments, as specified in the software, are automatically synthesized by our HLS tool into parallel-operating hardware sub-circuits. Both data parallelism and task-level parallelism are supported, as is the combined use of both Pthreads and OpenMP. Moreover, our work also provides automated synthesis for commonly occurring synchronization constructs within the Pthreads/OpenMP library: mutual exclusion (mutex) and barriers. Essentially, our framework allows a software engineer to specify parallelism to an HLS tool using methodologies they are likely to be familiar with. An experimental study considers a variety of parallelization scenarios, including demonstrated speedups of up to 12.9× in circuit wall-clock time for the 16-thread case and area-delay product as low as 12% (~8× improvement) when using 4 pipelined hardware threads.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132050049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}