{"title":"Real-Time Object Detection on 640x480 Image With VGG16+SSD","authors":"Hyeong-Ju Kang","doi":"10.1109/ICFPT47387.2019.00082","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00082","url":null,"abstract":"Convolutional neural networks (CNNs) show high performance in computer vision tasks including object detection, but a lot of weight storage and computation requirement prohibits real-time processing, 30 frames per second (FPS). This demonstration will show an CNN accelerator that can process real-time object detection on the 640x480 image. A high performance, complex CNN was implemented, single-shot multibox detector (SSD) with VGG16. The number of weights is reduced by a pruning scheme. For the higher utilization of operators, the accelerator-aware pruning was applied. The weights of the pruned network can be entirely stored in the internal memory. The proposed design reaches 42 FPS on XC7VX690T FPGA, showing VOC07 test mAP of 78.13%.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124714502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Dang, Farnoud Farahmand, Michal Andrzejczak, K. Gaj
{"title":"Implementing and Benchmarking Three Lattice-Based Post-Quantum Cryptography Algorithms Using Software/Hardware Codesign","authors":"V. Dang, Farnoud Farahmand, Michal Andrzejczak, K. Gaj","doi":"10.1109/ICFPT47387.2019.00032","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00032","url":null,"abstract":"It has been predicted that within the next tenfifteen years, quantum computers will have computational power sufficient to break current public-key cryptography schemes. When that happens, all traditional methods of dealing with the growing computational capabilities of potential attackers, such as increasing key sizes, will be futile. The only viable solution is to develop new standards based on algorithms that are resistant to quantum computer attacks and capable of being executed on traditional computing platforms, such as microprocessors and FPGAs. Leading candidates for new standards include lattice-based post-quantum cryptography (PQC) algorithms. In this paper, we present the results of implementing and benchmarking three lattice-based key encapsulation mechanisms (KEMs) that have progressed to Round 2 of the NIST standardization process. Our implementations are based on a software/hardware codesign approach, which is particularly applicable to the current stage of the NIST PQC standardization process, where the large number and high complexity of the candidates make traditional hardware benchmarking extremely challenging. We propose and justify the choice of a suitable system-on-chip platform and design methodology. The obtained results indicate the potential for very substantial speed-ups vs. purely software implementations, reaching 28x for encapsulation and 20x for decapsulation.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132668936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiqiang Que, Yanyang Liu, Ce Guo, Xinyu Niu, Yongxin Zhu, W. Luk
{"title":"Real-Time Anomaly Detection for Flight Testing Using AutoEncoder and LSTM","authors":"Zhiqiang Que, Yanyang Liu, Ce Guo, Xinyu Niu, Yongxin Zhu, W. Luk","doi":"10.1109/ICFPT47387.2019.00072","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00072","url":null,"abstract":"Flight testing is crucial in validating the functionality and safety in new commercial aircraft design before mass production. The challenge is to support real-time analysis of high-dimensional time series data generated from tens of thousands of sensors around the aircraft during test flights. We propose a novel 2-stage approach, using a fine-tuned autoencoder to extract the generic underlying features of high-dimensional data, followed by a stacked LSTM using the learned features to predict aircraft time series and to detect anomalies in real-time for flight testing. A novel Timestep(TS)-buffer is introduced to avoid redundant calculations of LSTM gate operations to reduce system latency. Compared with a software implementation of the AutoEncoder-LSTM on CPU and GPU, our FPGA design is respectively 36.3 and 23.9 times faster and consumes 247 and 499 times less energy.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130102117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Takeshi Ohkawa, S. Tayama, H. Mori, Do-Hyeong Lee, Hayato Amano, Itsuki Hirakawa, Mikiko Sato, Harumi Watanabe
{"title":"Design and Development of Networked Multiple FPGA Components for Autonomous Tiny Robot Car","authors":"Takeshi Ohkawa, S. Tayama, H. Mori, Do-Hyeong Lee, Hayato Amano, Itsuki Hirakawa, Mikiko Sato, Harumi Watanabe","doi":"10.1109/ICFPT47387.2019.00096","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00096","url":null,"abstract":"This article presents the design and the development of a system composed of multiple FPGA components for autonomous robot cars. In the FPT'19 FPGA design competition, there are several tasks for image recognition such as driving lane, traffic signal, human or object on the road. To promote team development, component-oriented FPGA development is employed. In this paper, we describe the whole system design by integrating FPGA components for autonomous driving, the design of each FPGA component and its development process.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114354253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ZFP-V: Hardware-Optimized Lossy Floating Point Compression","authors":"Gongjin Sun, S. Jun","doi":"10.1109/ICFPT47387.2019.00022","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00022","url":null,"abstract":"Lossy floating point compression algorithms are critical components of reducing the cost and improving the performance of many modern applications, including machine learning and scientific computing. Data compression is widely used to reduce data storage requirements and transfer overhead, but traditional data-oblivious lossless compression schemes are very inefficient for floating point data. On the other hand, recently proposed lossy compression algorithms like ZFP and SZ achieve very high rates of compression while controlling the tolerable error margin. To the best of our knowledge, no efficient hardware implementation of ZFP exists yet, partially due to the inherently serial nature of the algorithm. In this paper, we present the design and implementation of ZFP-V, which identifies the serial portion of the ZFP algorithm and modifies it for more efficient hardware implementation. ZFP-V replaces the \"group testing\" part of ZFP with a variable-length header, which allows our hardware implementation to achieve up to 2x performance improvement compared to our best-effort hardware implementation of the original algorithm while using less on-chip resources, at a marginal reduction of compression ratio. We evaluate an OpenCL implementation of ZFP-V on an Intel Arria 10 FPGA using a variety of real-world scientific datasets, and show a single-pipeline throughput of 1 GB/s – 4 GB/s compression and 2 GB/s – 10 GB/s decompression on real-world datasets. Our implementation often outperforms a 32-thread software implementation on a high-end Intel Xeon CPU, and significantly outperforms a state-of-the-art FPGA implementation of SZ.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114359609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tomonari Tanaka, I. Ikeno, Riku Tsuruoka, Takumi Kuchiba, Wang Liao, Y. Mitsuyama
{"title":"Development of Autonomous Driving System Using Programmable SoCs","authors":"Tomonari Tanaka, I. Ikeno, Riku Tsuruoka, Takumi Kuchiba, Wang Liao, Y. Mitsuyama","doi":"10.1109/ICFPT47387.2019.00091","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00091","url":null,"abstract":"We propose an autonomous driving system using programmable SoCs. Our system consists of two FPGA boards of Zybo Z7-20, three cameras, and one motor driver of H-bridge circuits with independent left and right motors. The main part of our system is divided into two blocks: driving control and object detection, which are implemented on each Zybo Z7-20. Within the development framework of programmable SoC of Zynq 7000, we adopt a HW/SW co-design to blance the design period and system performance to satisify the goals of the FPGA design competition.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116500916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shimon Kudaka, Ai Suzuki, Natsumi Yamada, Noriki Oshiro, Taichi Miyagi, Yasunori Osana
{"title":"Self-Driving Car Application of a Stream-Oriented Accelerator Framework","authors":"Shimon Kudaka, Ai Suzuki, Natsumi Yamada, Noriki Oshiro, Taichi Miyagi, Yasunori Osana","doi":"10.1109/ICFPT47387.2019.00086","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00086","url":null,"abstract":"In our self-driving car design for FPT'19 design competition, a stream-oriented acceleration framework is ported to Zynq SoC to enable easy co-design of software and hardware. The framework, OpenFC is primary intended to build a multi-FPGA acceleration cluster with HLS programmability easily but also powerful for embedded acceleration with multiple acceleration modules for image recognition. With the OpenFC framework, programmers can send and receive data streams between the microprocessor and accelerator modules called SPE (streaming processing element.) This paper describes briefly about our self-driving car design with the framework.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121869001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Coarse-Grained Reconfigurable Architecture with a Fault Tolerant Non-Volatile Configurable Memory","authors":"Takeharu Ikezoe, Takuya Kojima, H. Amano","doi":"10.1109/ICFPT47387.2019.00018","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00018","url":null,"abstract":"Recent IoT devices require extremely low standby power consumption, while a certain performance is needed during the active time, and Coarse Grain Reconfigurable Arrays (CGRAs) are suitable because of their high energy efficiency. However, even in CGRAs, the leakage power for its configuration memory must be reduced. Although the power gating is a popular technique, the data in flip-flops and memory are lost so they must be retrieved after the wake-up. Recovering everything requires numerous state transitions and considerable overhead both on its execution time and energy. To address the problem, Non-volatile Cool Mega Array (NVCMA), a CGRA providing non-volatile flip-flops (NVFFs) with spin transfer torque type non-volatile memory (NVM) technology has been developed. However, in general, non-volatile memory technologies have problems with reliability. Some NVFFs are stacked-at-0/1, and cannot store the data in a certain possibility. To improve the chip yield, we propose a mapping algorithm to avoid faulty processing elements of the CGRA caused by the erroneous configuration data. Then, we also propose a method to add an error-correcting code (ECC) mechanism to NVFFs used for the configuration and constant memory. The proposed method was applied to NVCMA to evaluate the availability rate and reduction of write time. By using both methods, the 99.4% availability ratio is achieved with 0.1% probability of faulty FFs, while almost no chips are available without using them. The energy for storing data becomes about 2.28 times because of the hardware overhead of ECC but the proposed method can save 11.1% of the storing energy on average.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128074642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HILL: A Hardware Isolation Framework Against Information Leakage on Multi-Tenant FPGA Long-Wires","authors":"Yukui Luo, Xiaolin Xu","doi":"10.1109/ICFPT47387.2019.00060","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00060","url":null,"abstract":"FPGA has recently been deployed in the multi-tenant cloud to provide high-performance computing capabilities. Such deployment of FPGA creates a new attack surface for adversary. It has been recently demonstrated that the capacitive crosstalk between FPGA long-wires can be used as a side-channel to extract secret information. In this paper, we present HILL: a Hardware Isolation framework against information Leakage on multi-tenant FPGA Long-wires. As a defense framework, HILL can prioritize the placement and routing of security-critical hardware instances and isolate them from other parts and tenants. For data and communication interfaces that use FPGA long-wires, such as UART, PCIe, and AXI4, HILL employs a long-wire obfuscation technique to reduce the side-channel leakage. We evaluate the performance of HILL with Xilinx Artix-7 FPGAs using two prevalent FPGA development tools: Xilinx ISE 14.7 and Vivado 2018.3. The experimental results demonstrate that HILL can effectively reduce the crosstalk-caused side-channel leakage by 138 times. The long-wire obfuscation technique reduces the correlation between the side-channel leakage and secret key from 81.7% to 50.3%, which is close to random guess.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121681086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ASAP: Automatic Sizing and Partitioning for Dynamic Memory Heaps in High-Level Synthesis","authors":"Nicholas V. Giamblanco, J. Anderson","doi":"10.1109/ICFPT47387.2019.00046","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00046","url":null,"abstract":"Efficient high-level synthesis (HLS) of dynamic memory allocation techniques (malloc() and free()) simplifies the compilation of algorithms with runtime-varying memory requirements to hardware designs. Existing HLS memory allocation frameworks often degrade performance and area, while simultaneously introducing even more parameters to optimize (e.g. heap depth, heap assignments to program logic). We address these concerns with ASAP (Automatic Sizing and Partitioning),a dynamic memory allocation framework for HLS tools. ASAP provides (1) automatic heap depth selection through dynamic analysis of an application, (2) automatic heap partitioning (through static analysis) to provide parallelism from program logic to memory, improving performance. We demonstrate that ASAP is able to improve performance and reduce cycle latencies compared with non-heap-partitioned designs, with speed-ups up to ~ 5× when applied to common memory patterns, and up to ~ 2× improvement when applied to a suite of dynamic-memory intensive applications.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122454243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}