2018 International Conference on Field-Programmable Technology (FPT)最新文献

筛选
英文 中文
Unified On-Chip Software and Hardware Debug for HLS-Accelerated Programs hls加速程序的统一片上软硬件调试
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00072
M. Ashcraft, Jeffrey B. Goeders
{"title":"Unified On-Chip Software and Hardware Debug for HLS-Accelerated Programs","authors":"M. Ashcraft, Jeffrey B. Goeders","doi":"10.1109/FPT.2018.00072","DOIUrl":"https://doi.org/10.1109/FPT.2018.00072","url":null,"abstract":"Modern high-level synthesis (HLS)-based tools allow for the creation of complex systems where parts of the user's software are executed on a conventional processor, and the other parts are implemented as hardware accelerators via HLS flows. While modern tools allow designers to construct these systems relatively quickly, observing and debugging the real-time execution of these complex systems remains challenging. Recent academic work has focused on providing designers software-like visibility into the execution of their HLS hardware accelerators; however, this work has assumed that the hardware is observed in isolation. In this work we demonstrate techniques toward a unified in-system software and hardware debugging environment, where the user can capture execution of both the hardware and software domains, and their interactions. We present the performance costs of capturing this execution data, exploring the impact of different levels of observation.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134220366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
FPGA Design for Autonomous Vehicle Driving Using Binarized Neural Networks 基于二值化神经网络的自动驾驶汽车FPGA设计
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00091
Kaijie Wei, Koki Honda, H. Amano
{"title":"FPGA Design for Autonomous Vehicle Driving Using Binarized Neural Networks","authors":"Kaijie Wei, Koki Honda, H. Amano","doi":"10.1109/FPT.2018.00091","DOIUrl":"https://doi.org/10.1109/FPT.2018.00091","url":null,"abstract":"We propose an autonomous vehicle controlled by FPGAs. In our design, considering embedded systems, we apply the binarized neural networks (BNNs) which can realize a satis-fying result in high speed and accuracy to recognize pedestrians and some obstacles on a given road. To detect the traffic light, a passive camera-based pipeline is applied. Furthermore, the implementation of road lane detection is based on color selection algorithm, Canny Edge Detection, and Hough Transformation. The proposed design is realized by two Xilinx boards: PYNQ-Z1 and Zynq-Xc7Z010. These two FPGA boards cooperate with each other through a shared network cable. In the proposed design, the resource used by Zynq-Xc7Z010 can be greatly reduced and the inference time on the FPGA has been thousands times faster than the software implementation.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130842990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A System-Level Transprecision FPGA Accelerator for BLSTM Using On-chip Memory Reshaping 基于片上存储器重构的BLSTM系统级高精度FPGA加速器
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00068
D. Diamantopoulos, C. Hagleitner
{"title":"A System-Level Transprecision FPGA Accelerator for BLSTM Using On-chip Memory Reshaping","authors":"D. Diamantopoulos, C. Hagleitner","doi":"10.1109/FPT.2018.00068","DOIUrl":"https://doi.org/10.1109/FPT.2018.00068","url":null,"abstract":"The large amount of processing and storage of modern neural networks challenges engineers to architect dedicated and tailored hardware with high energy efficiency. At the inflection point of choosing among the most appropriate acceleration platform, FPGAs offer a competitive advantage with their irregular parallelism and bit-level re-programmability, at the cost of development effort. One critical problem is the lack of a common development flow between CPU and FPGA that combines advantages of both software and hardware world, i.e. integrated programmability and adaptable acceleration. This work presents a system-level FPGA implementation framework for BLSTM-based neural networks acceleration that introduces a) flexible reduced-precision (transprecision) data-paths and b) on-chip memory reshaping for storing model parameters. By evaluating the proposed architecture to an OCR application, it was possible to decrease the energy-to-solution by 21.9x and 2.6x compared to that of a POWER8 processor and a P100 GPU, respectively.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133473616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Very Massive Hardware Merge Sorter 非常庞大的硬件合并排序器
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00023
M. Saitoh, Kenji Kise
{"title":"Very Massive Hardware Merge Sorter","authors":"M. Saitoh, Kenji Kise","doi":"10.1109/FPT.2018.00023","DOIUrl":"https://doi.org/10.1109/FPT.2018.00023","url":null,"abstract":"The state-of-the-art hardware merge sorter called MMS has the tie-record issue that the records having the same key can cause the problem. MMS solves this issue by inefficient scheme comparing both key and satellite data fields of records to determine whether two records are swapped or not. We propose a high-performance hardware merge sorter (VMS) which adopts an efficient solution to the issue comparing just key fields. We also present the detailed circuit of VMS that adopts some implementation optimizations. We implement and evaluate VMS on a Virtex-7 FPGA. The evaluation results show that our proposed merge sorter requires fewer hardware resources and achieves 1.44x better throughput than MMS when large records are used.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134081599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Transparent Acceleration of Image Processing Kernels on FPGA-Attached Hybrid Memory Cube Computers 基于fpga的混合存储立方体计算机图像处理核的透明加速
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00069
Md Jubaer Hossain Pantho, Joel Mandebi Mbongue, C. Bobda, D. Andrews
{"title":"Transparent Acceleration of Image Processing Kernels on FPGA-Attached Hybrid Memory Cube Computers","authors":"Md Jubaer Hossain Pantho, Joel Mandebi Mbongue, C. Bobda, D. Andrews","doi":"10.1109/FPT.2018.00069","DOIUrl":"https://doi.org/10.1109/FPT.2018.00069","url":null,"abstract":"The Hybrid Memory Cube (HMC) is representative of emerging architectures that integrate FPGAs with multichannel interconnected 3-D stacked memory, offering great potential for high bandwidth streaming applications. However, creating new hardware components that tap the full potential of the concurrent communications channels requires the structural understanding of the memory layout and interconnect configurations. In this paper, we present a new development framework aimed at removing the need for software programmers to understand the underlying physical architecture. The proposed framework automates the creation of hardware/software co-designs for computer vision applications in a transparent way to the developer. The development system dynamically detects function calls in software kernels and replaces those calls by a hardware wrapper function that exploits the HMCs memory hierarchy and multichannel interconnect with the FPGA. Results show our flow can exploit the 3-D stacked memory and concurrent communications channels to achieve speed-up with no need to tune the original software application to the memory hierarchy.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115619227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Short-Transfer Model for Tightly-Coupled CPU-FPGA Platforms CPU-FPGA紧密耦合平台的短传输模型
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00075
Alexander Kroh, O. Diessel
{"title":"A Short-Transfer Model for Tightly-Coupled CPU-FPGA Platforms","authors":"Alexander Kroh, O. Diessel","doi":"10.1109/FPT.2018.00075","DOIUrl":"https://doi.org/10.1109/FPT.2018.00075","url":null,"abstract":"Due to the cost of repeated data movement between CPU and FPGA, the use of FPGA-based accelerators has traditionally been limited to offloading long-running tasks from the CPU to programmable logic. Although modern heterogeneous platforms, such as Zynq and HARP, reduce the costs of CPU-FPGA data transfers, the traditional offload model is cemented as the popular choice. For these systems to become truly heterogeneous, the utilisation of all computational resources should be optimised. In particular, the CPU and FPGA should cooperate by dividing the workload between them so as to maximize system throughput. We first derive a model that predicts the optimum partitioning of a workload between hardware and software. We then measure the performance of short transfers between CPU and FPGA on the Zynq CPU-FPGA platform. Such transfers are essential to efficiently synchronise between cooperating hardware and software tasks. Finally, we demonstrate how our derived model can be used to choose the optimum workload partitioning to within 8% of the optimum for an accumulator task and predict its execution time within 12%.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115652142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient Inter-Kernel Communication for OpenCL Database Operators on FPGAs fpga上OpenCL数据库运算符的高效内核间通信
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00050
Tobias Drewes, J. Joseph, B. Gurumurthy, David Broneske, G. Saake, Thilo Pionteck
{"title":"Efficient Inter-Kernel Communication for OpenCL Database Operators on FPGAs","authors":"Tobias Drewes, J. Joseph, B. Gurumurthy, David Broneske, G. Saake, Thilo Pionteck","doi":"10.1109/FPT.2018.00050","DOIUrl":"https://doi.org/10.1109/FPT.2018.00050","url":null,"abstract":"Many modern database engines use OpenCL to target heterogeneous hardware. Queries are evaluated by execution of chains of low-level operators. The common paradigm for OpenCL workloads facilitates communication between kernels using buffers in off-chip memory. This poses a severe performance limitation due to weak memory systems of FPGAs in contrast to the memory hierarchy available in CPUs and GPUs. To overcome this bottleneck, we propose the use of structural optimizations of kernel code. On-chip pipelining and code fusion are analyzed as alternatives to buffer-based inter-kernel communication. We assess the impact on resource utilization and system throughput and thereby demonstrate that properly structured code achieves a speedup of more than 4x over the default paradigm. This shows that it is essential for chains of kernels to consider not only optimization techniques for individual kernels, but also optimization of inter-kernel communication.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131693431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
FPGA Acceleration of a Supervised Learning Method for Hyperspectral Image Classification 高光谱图像分类中监督学习方法的FPGA加速
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00051
Kento Tajiri, T. Maruyama
{"title":"FPGA Acceleration of a Supervised Learning Method for Hyperspectral Image Classification","authors":"Kento Tajiri, T. Maruyama","doi":"10.1109/FPT.2018.00051","DOIUrl":"https://doi.org/10.1109/FPT.2018.00051","url":null,"abstract":"Hyperspectral image classification is one of the most important techniques for analyzing hyperspectral image that have hundreds of spectrum luminance values. For this classification, supervised learning methods are widely used, but in general, they have a trade-off between their accuracy and computational complexity. In this paper, we propose an FPGA implementation of hyperspectral image classification based on a composite kernel method. Because of the large size of hyperspectral images, the data mapping becomes the most critical issue for achieving higher processing speed. Two data mapping approaches are discussed, and one of them that is most suitable for our target images is implemented on an FPGA. Its processing speed for 145×145 pixel images is fast enough for real-time processing, and its accuracy is comparable with other classification algorithms.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122898083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Performance Estimation for Exascale Reconfigurable Dataflow Platforms Exascale可重构数据流平台的性能评估
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00062
Ryota Yasudo, J. Coutinho, A. Varbanescu, W. Luk, H. Amano, Tobias Becker
{"title":"Performance Estimation for Exascale Reconfigurable Dataflow Platforms","authors":"Ryota Yasudo, J. Coutinho, A. Varbanescu, W. Luk, H. Amano, Tobias Becker","doi":"10.1109/FPT.2018.00062","DOIUrl":"https://doi.org/10.1109/FPT.2018.00062","url":null,"abstract":"The next generation high-performance computing platforms will need to support exascale computing. A promising path in achieving exascale is to embrace heterogeneity and specialised computing in the form of reconfigurable accelerators. However, assessing the feasibility of heterogeneous exascale systems requires fast and accurate performance prediction. This paper proposes PERKS, a novel performance estimation frame-work for reconfigurable dataflow platforms (RDPs). PERKS uses machine and application parameters to build an analytical model for predicting the performance of multi-accelerator systems. Moreover, model calibration is automatic, making the model flexible and usable for different machine configurations and applications. Our experimental results demonstrate that PERKS can predict the performance of current workloads and RDPs with an accuracy above 95%. We also demonstrate how the modelling scales to exascale workloads and exascale platforms.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116881761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
[Copyright notice] (版权)
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/fpt.2018.00003
{"title":"[Copyright notice]","authors":"","doi":"10.1109/fpt.2018.00003","DOIUrl":"https://doi.org/10.1109/fpt.2018.00003","url":null,"abstract":"","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114153251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信