2019 International Conference on Field-Programmable Technology (ICFPT)最新文献

筛选
英文 中文
Real-Time Automatic Modulation Classification 实时自动调制分类
2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00052
Stephen Tridgell, D. Boland, P. Leong, Siddhartha
{"title":"Real-Time Automatic Modulation Classification","authors":"Stephen Tridgell, D. Boland, P. Leong, Siddhartha","doi":"10.1109/ICFPT47387.2019.00052","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00052","url":null,"abstract":"Deep learning based techniques have shown promising results over traditional hand-crafted methods for automatic modulation classification for radio signals. However, implementation of these deep learning models on specialized hardware can be challenging, as both latency and throughput performance are critical to achieving real-time response to over-the-air radio signals. In this work, we meet our targets by designing an optimized ternarized convolutional neural network that leverages the RF capabilities offered by the Xilinx ZCU111 RFSoC platform. The implemented networks achieve high-speed real-time performance with a classification latency of ≈8µs, and an operational throughput of 488k classifications per second. On the challenging open-source RadioML dataset, we achieve up to 81.1% accuracy, which is competitive to existing state-of-the-art software-only implementations.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134309164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
FPGA-Based Object Detection for Autonomous Driving System 基于fpga的自动驾驶系统目标检测
2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00094
K. Harada, K. Kanazawa, M. Yasunaga
{"title":"FPGA-Based Object Detection for Autonomous Driving System","authors":"K. Harada, K. Kanazawa, M. Yasunaga","doi":"10.1109/ICFPT47387.2019.00094","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00094","url":null,"abstract":"Autonomous driving systems require a real-time detection of the objects including pedestrians and obstacles on the roads. Object detection by image processing is a popular approach for autonomous driving systems. However, there are complex reflections caused by multiple light sources and objects on the roads, which disturb the robust and real-time object detection. This paper describes an FPGA-based object detection method for autonomous driving systems using multiple CMOS cameras. Our system is implemented on Xilinx Zynq-7020 SoC-FPGA, in which real-time processing of tracking white-lines for lane-keeping and obstacles/pedestrians detection on the roads are executed by the hardware on the Programmable Logics, and the whole system is controlled by a software on the Processing System (CPU) written in Python language.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115782079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Dependency-Aware Clustering for Variable-Grained Hardware-Software Partitioning 面向可变粒度软硬件分区的依赖感知聚类
2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00080
Deshya Wijesundera, Nadeeshan D. K. Dissanayake, Alok Prakash, T. Srikanthan, Damith Anhettigama
{"title":"Dependency-Aware Clustering for Variable-Grained Hardware-Software Partitioning","authors":"Deshya Wijesundera, Nadeeshan D. K. Dissanayake, Alok Prakash, T. Srikanthan, Damith Anhettigama","doi":"10.1109/ICFPT47387.2019.00080","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00080","url":null,"abstract":"The increasing adoption of FPGA-based systems, calls for efficient and effective partitioning of application components between the hardware and software of the FPGA platform. In this work, we propose a technique for application-specific data dependency-aware clustering that facilitates variable-grained hardware-software partitioning. The variable granularity makes the approach suitable for both large and small applications as well as stringent resource constraints and mitigates the impact of relaxed communication models in partitioning heuristics. Validated on applications from the CHStone benchmark suite the technique achieves 15% and 7% performance improvement compared to function and basic block level approaches respectively.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"175 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121232792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Open-Source Lightweight Timing Model for RapidWright RapidWright的开源轻量级计时模型
2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00028
P. Maidee, Christopher E. Neely, A. Kaviani, C. Lavin
{"title":"An Open-Source Lightweight Timing Model for RapidWright","authors":"P. Maidee, Christopher E. Neely, A. Kaviani, C. Lavin","doi":"10.1109/ICFPT47387.2019.00028","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00028","url":null,"abstract":"Access to detailed timing information for FPGA resources is essential to achieving the highest performance. Yet, for commercial FPGAs, much of this information is not published or available. At the same time, deploying large, fine-grained timing datasets adversely affects the speed of timing-driven place and route algorithms. We propose a nimble timing model for RapidWright that delivers high fidelity timing approximations while enabling faster algorithms through a frugal memory footprint. By leveraging a combination of architectural knowledge, repeating patterns and extensive analysis of Vivado timing reports, we obtain a slightly pessimistic, lumped delay model within 2% average accuracy of Vivado for UltraScale+ devices. We validate the results with over 240 designs and the proposed model shows high fidelity to Vivado with a Spearman's value of 0.99. By open sourcing the proposed model and describing the process, we empower the community to leverage and extend this work for customized domains, other device families, and additional accuracy.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123659202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
SkyCastle: A Resource-Aware Multi-Loop Scheduler for High-Level Synthesis SkyCastle:用于高级合成的资源感知多循环调度程序
2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00013
J. Oppermann, Lukas Sommer, Lukas Weber, Melanie Reuter-Oppermann, A. Koch, O. Sinnen
{"title":"SkyCastle: A Resource-Aware Multi-Loop Scheduler for High-Level Synthesis","authors":"J. Oppermann, Lukas Sommer, Lukas Weber, Melanie Reuter-Oppermann, A. Koch, O. Sinnen","doi":"10.1109/ICFPT47387.2019.00013","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00013","url":null,"abstract":"A common optimisation problem in the high-level synthesis (HLS) of FPGA-based accelerators is to find a microarchitecture that maximises the performance while keeping the utilisation of the device's low-level resources below certain limits. We propose to tackle it directly as part of the HLS scheduler. To that end, we formalise a general, integrated scheduling and allocation problem for HLS kernels, and present SkyCastle, a novel resource-aware multi-loop scheduler using integer linear programming to solve it for a subclass of kernels composed of multiple, nested loops. In order to demonstrate the practical applicability of the approach, we model the scheduler in such a way as to be plug-in compatible with the Xilinx Vivado HLS engine, allowing the computed solutions to be fed back into its synthesis flow. We evaluate SkyCastle for three non-trivial kernels from the machine learning, signal processing, and physical simulation domains, on two FPGA devices. Additionally, we investigate the replication of slightly slower, but smaller accelerators as a means to further boost the overall performance. In contrast to Vivado HLS' default settings, which aim at maximum performance but may fail in later synthesis steps, the solutions computed by our scheduler always result in synthesisable designs.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128815431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Study on Switch Block Patterns for Tileable FPGA Routing Architectures 可平铺FPGA路由架构的开关块模式研究
2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00039
Xifan Tang, Edouard Giacomin, Aurélien Alacchi, P. Gaillardon
{"title":"A Study on Switch Block Patterns for Tileable FPGA Routing Architectures","authors":"Xifan Tang, Edouard Giacomin, Aurélien Alacchi, P. Gaillardon","doi":"10.1109/ICFPT47387.2019.00039","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00039","url":null,"abstract":"Following the rapid growth of Field Programmable Gate Arrays (FPGAs) sizes, the regularity of architectures has become a critical feature, leading to the development of millionof-LUT devices. While the routing architecture plays a dominant role in the area, delay and power of modern FPGAs, most of previously published works focus on improving the routability and performance of FPGAs while very few studied tileable (highly-regular) routing architectures. In this paper, we provide a detailed analysis between tileable and popular nontileable FPGAs considering modern routing architectures. First, we upgrade VPR to generate tileable routing architecture, which can support different switch block patterns for (1) the routing tracks that start/end in a tile and (2) the routing tracks that pass through a tile. Then, we evaluate the performance of mixed switch blocks patterns in the context of a Stratix IV-like FPGA architecture, by considering the most representative patterns, i.e., Subset, Universal and Wilton. Experimental results show that averaged over the MCNC and VTR benchmarks, when compared to the well-optimized non-tileable architectures, the tileable architectures can improve the minimum routable channel width by 13% and area-delay product by 2%. In particular, our results showed that in the context of tileable FPGA, a mix of Universal and Wilton switch block patterns lead to the best tradeoff in area, delay and routability, while Wilton switch block was the best choice in non-tileable FPGAs.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126412802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
AutoBoxing: Improving GCC Passes to Optimize HW/SW Multi-Versioning of Kernels for HLS AutoBoxing:改进GCC通道以优化HLS的硬件/软件多版本内核
2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00057
Johanna Rohde, C. Hochberger
{"title":"AutoBoxing: Improving GCC Passes to Optimize HW/SW Multi-Versioning of Kernels for HLS","authors":"Johanna Rohde, C. Hochberger","doi":"10.1109/ICFPT47387.2019.00057","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00057","url":null,"abstract":"Compute intense software (SW) parts that are mapped to Hardware (HW) during High-Level-Synthesis often need to be present in the final software for fallback reasons. Optimal SW and HW implementations need very different optimizations. Thus, multiple versions of the same code have to be implemented. Yet, these different versions must use the same interface for compatibility reasons. In this contribution, we present AutoBoxing as a solution for this problem. We have implemented AutoBoxing in the PIRANHA GCC plugin and we demonstrate its effect using the Powerstone benchmarks.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116789175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MajorityNets: BNNs Utilising Approximate Popcount for Improved Efficiency MajorityNets:利用近似人口数量提高效率的 BNNs
2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00062
Seyedramin Rasoulinezhad, Sean Fox, Hao Zhou, Lingli Wang, D. Boland, P. Leong
{"title":"MajorityNets: BNNs Utilising Approximate Popcount for Improved Efficiency","authors":"Seyedramin Rasoulinezhad, Sean Fox, Hao Zhou, Lingli Wang, D. Boland, P. Leong","doi":"10.1109/ICFPT47387.2019.00062","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00062","url":null,"abstract":"Binarized neural networks (BNNs) have shown exciting potential for utilising neural networks in embedded implementations where area, energy and latency constraints are paramount. With BNNs, multiply-accumulate (MAC) operations can be simplified to XnorPopcount operations, leading to massive reductions in both memory and computation resources. Furthermore, multiple efficient implementations of BNNs have been reported on field-programmable gate array (FPGA) implementations. This paper proposes a smaller, faster, more energy-efficient approximate replacement for the XnorPopcount operation, called XNorMaj, inspired by state-of-the-art FPGA look-up table schemes which benefit FPGA implementations. We show that XNorMaj is up to 2x more resource-efficient than the XnorPopcount operation. While the XNorMaj operation has a minor detrimental impact on accuracy, the resource savings enable us to use larger networks to recover the loss.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129791553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Power-Aware FPGA Mapping of Convolutional Neural Networks 卷积神经网络的功耗感知FPGA映射
2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00059
Alexander Montgomerie-Corcoran, Stylianos I. Venieris, C. Bouganis
{"title":"Power-Aware FPGA Mapping of Convolutional Neural Networks","authors":"Alexander Montgomerie-Corcoran, Stylianos I. Venieris, C. Bouganis","doi":"10.1109/ICFPT47387.2019.00059","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00059","url":null,"abstract":"With an unprecedented accuracy in numerous AI tasks, convolutional neural networks (CNNs) are rapidly deployed on power-limited mobile and embedded applications. Existing mapping approaches focus on achieving high performance without explicit consideration of power consumption, leading to suboptimal solutions when power is considered in a subsequent stage. In this context, there is an emerging need for power-aware methodologies for the design of custom CNN engines. In this work, a methodology is presented for modelling the power consumption of FPGA-based CNN accelerators using a high-level description of modules, together with a power-centric search strategy for exploring power-performance trade-offs within the CNN-to-FPGA design space. By integrating into an existing CNN-to-FPGA toolflow, the proposed power estimation method can yield a prediction accuracy of 93.4% for total system power consumption. Furthermore, it is demonstrated that the associated power-oriented exploration approach can generate CNN accelerators with a 20.1% power reduction over a purely throughput-driven design for AlexNet, maintaining the design's throughput.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133329208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Enhanced Heterogeneous Cloud: Transparent Acceleration and Elasticity 增强异构云:透明加速和弹性
2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00027
Jessica Vandebon, J. Coutinho, W. Luk, E. Nurvitadhi, Mishali Naik
{"title":"Enhanced Heterogeneous Cloud: Transparent Acceleration and Elasticity","authors":"Jessica Vandebon, J. Coutinho, W. Luk, E. Nurvitadhi, Mishali Naik","doi":"10.1109/ICFPT47387.2019.00027","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00027","url":null,"abstract":"This paper presents ORIAN, a fully-managed Platform-as-a-Service (PaaS) for deploying high-level applications onto large-scale heterogeneous cloud infrastructures. We aim to make specialised, accelerator resources in the cloud accessible to software developers by extending the traditional homogeneous PaaS execution model to support automatic runtime management of heterogeneous compute resources such as CPUs and FPGAs. In particular, we focus on two mechanisms: transparent acceleration, which automatically maps jobs to the most suitable resource configuration, and heterogeneous elasticity, which performs automatic vertical (type) and horizontal (quantity) scaling of provisioned resources to guarantee QoS (Quality of Service) objectives while minimising cost. We develop a prototype to validate our approach, targeting a hardware platform with combined computational capacity of 28 FPGAs and 36 CPU cores, and evaluate it using case studies in three application domains: machine learning, bioinformatics, and physics. Our transparent acceleration decisions achieve on average 96% of the maximum manually identified static configuration throughput for large workloads, while removing the burden of determining configuration from the user; an elastic ORIAN resource group provides a 2.3 times cost reduction compared to an over-provisioned group for non-uniform, peaked job sequences while guaranteeing QoS objectives; and our malleable architecture extends to support a new, more suitable resource type, automatically reducing the cost by half while maintaining throughput, and achieving a 23% throughput increase while fulfilling resource constraints.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133384287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信