2021 International Conference on Field-Programmable Technology (ICFPT)最新文献

A streaming hardware architecture for real-time SIFT feature extraction 一种实时SIFT特征提取的流硬件架构

2021 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2021-12-06 DOI: 10.1109/ICFPT52863.2021.9609932

Hector A. Li Sanchez, A. George

{"title":"A streaming hardware architecture for real-time SIFT feature extraction","authors":"Hector A. Li Sanchez, A. George","doi":"10.1109/ICFPT52863.2021.9609932","DOIUrl":"https://doi.org/10.1109/ICFPT52863.2021.9609932","url":null,"abstract":"The Scale-Invariant Feature Transform (SIFT) is a feature extractor that serves as a key step in many computer-vision pipelines. Real-time operation based on a software-only approach is often infeasible, but FPGAs can be employed to parallelize execution and accelerate the application to meet latency requirements. In this study, we present a stream-based hardware acceleration architecture for SIFT feature extraction. Using a novel strategy to store pixels required for descriptor computation, the execution time needed to generate SIFT descriptors is greatly improved relative to previous designs. This strategy also enables further reduction of the execution time by introducing multiple processing elements (PEs) for computation of several SIFT descriptors in parallel. Additionally, the proposed architecture supports keypoint detection at an arbitrary number of octaves and allows for runtime configuration of various parameters. An FPGA implementation targeting the Xilinx Zynq-7045 system-on-chip (SoC) device is deployed to demonstrate the efficiency of the proposed architecture. In the target hardware, the resulting system is capable of processing images with a resolution of 1280 × 720 pixels at up to 150 FPS while maintaining modest resource utilization.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114938703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

FastCGRA: A Modeling, Evaluation, and Exploration Platform for Large-Scale Coarse-Grained Reconfigurable Arrays FastCGRA:一个大规模粗粒度可重构阵列的建模、评估和探索平台

2021 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2021-12-06 DOI: 10.1109/ICFPT52863.2021.9609928

Su Zheng, Kaisen Zhang, Yaoguang Tian, Wenbo Yin, Lingli Wang, Xuegong Zhou

{"title":"FastCGRA: A Modeling, Evaluation, and Exploration Platform for Large-Scale Coarse-Grained Reconfigurable Arrays","authors":"Su Zheng, Kaisen Zhang, Yaoguang Tian, Wenbo Yin, Lingli Wang, Xuegong Zhou","doi":"10.1109/ICFPT52863.2021.9609928","DOIUrl":"https://doi.org/10.1109/ICFPT52863.2021.9609928","url":null,"abstract":"Coarse-Grained Reconfigurable Arrays (CGRAs) provide sufficient flexibility in domain-specific applications with high hardware efficiency, which make CGRAs suitable for fast-evolving fields such as neural network acceleration and edge computing. To meet the requirement of the fast evolution, we propose FastCGRA, the modeling, mapping, and exploration platform for large-scale CGRAs. FastCGRA supports hierarchical architecture description and automatic switch module generation. Connectivity-aware packing and graph partition algorithms are designed to reduce the complexity of placement and routing. The graph homomorphism placement algorithm in FastCGRA enables efficient placement on large-scale CGRAs. The packing and placement algorithms cooperate with a negotiation-based routing algorithm to form an integral mapping procedure. FastCGRA can support the modeling and mapping of large-scale CGRAs with significantly higher placement and routing efficiency than existing platforms. The automatic switch module generation method can reduce the complexity of CGRA interconnection design. With these features, FastCGRA can boost the exploration of large-scale CGRAs.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115757638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

In-Storage Computation of Histograms with differential privacy 差分隐私直方图的存储计算

2021 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2021-12-06 DOI: 10.1109/ICFPT52863.2021.9609899

Andrei Tosa, A. Hangan, G. Sebestyen, Z. István

{"title":"In-Storage Computation of Histograms with differential privacy","authors":"Andrei Tosa, A. Hangan, G. Sebestyen, Z. István","doi":"10.1109/ICFPT52863.2021.9609899","DOIUrl":"https://doi.org/10.1109/ICFPT52863.2021.9609899","url":null,"abstract":"Network-attached Smart Storage is becoming increasingly common in data analytics applications. It relies on processing elements, such as FPGAs, close to the storage medium to offload compute-intensive operations, reducing data movement across distributed nodes in the system. As a result, it can offer outstanding performance and energy efficiency. Modern data analytics systems are not only becoming more distributed they are also increasingly focusing on privacy policy compliance. This means that, in the future, Smart Storage will have to offload more and more privacy-related processing. In this work, we explore how the computation of differentially private (DP) histograms, a basic building block of privacy-preserving analytics, can be offloaded to FPGAs. By performing DP aggregation on the storage side, untrusted clients can be allowed to query the data in aggregate form without risking the leakage of personally identifiable information. We prototype our idea by extending an FPGA-based distributed key-value store with three new components. First, a histogram module, that processes values at 100Gbps line-rate. Second, a random noise generator that adds noise to final histogram according to the rules dictated by DP. Third, a mechanism to limit the rate at which key-value pairs can be used in histograms, to stay within the DP privacy budget.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131347501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Scalable and Flexible High-Performance In-Network Processing of Hash Joins in Distributed Databases 分布式数据库中哈希连接的可扩展和灵活的高性能网络处理

2021 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2021-12-06 DOI: 10.1109/ICFPT52863.2021.9609804

John M. Wirth, Jaco A. Hofmann, Lasse Thostrup, Carsten Binnig, Andreas Koch

{"title":"Scalable and Flexible High-Performance In-Network Processing of Hash Joins in Distributed Databases","authors":"John M. Wirth, Jaco A. Hofmann, Lasse Thostrup, Carsten Binnig, Andreas Koch","doi":"10.1109/ICFPT52863.2021.9609804","DOIUrl":"https://doi.org/10.1109/ICFPT52863.2021.9609804","url":null,"abstract":"Programmable switches allow to offload specific processing tasks into the network and promise multi-Tbit/s throughput. One major goal when moving computation to the network is typically to reduce the volume of network traffic, and thus improve the overall performance. In this manner, programmable switches are increasingly used, both in research as well as in industry applications, for various scenarios, including statistics gathering, in-network consensus protocols, and more. However, the currently available programmable switches suffer from several practical limitations. One important restriction is the limited amount of available memory, making them unsuitable for stateful operations such as Hash Joins in distributed databases. In previous work, an FPGA-based In-Network Hash Join accelerator was presented, initially using DDR-DRAM to hold the state. In a later iteration, the hash table was moved to on-chip HBM-DRAM to improve the performance even further. However, while very fast, the size of the joins in this setup was limited by the relatively small amount of available HBM. In this work, we heterogeneously combine DDR-DRAM and HBM memories to support both larger joins and benefit from the far faster and more parallel HBM accesses. In this manner, we are able to improve the performance by a factor of 3x compared to the previous HBM-based work. We also introduce additional configuration parameters, supporting a more flexible adaptation of the underlying hardware architecture to the different join operations required by a concrete use-case.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130240373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Efficient Physical Page Migrations in Shared Virtual Memory Reconfigurable Computing Systems 共享虚拟内存可重构计算系统中的高效物理页面迁移

2021 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2021-12-06 DOI: 10.1109/ICFPT52863.2021.9609831

Torben Kalkhof, Andreas Koch

{"title":"Efficient Physical Page Migrations in Shared Virtual Memory Reconfigurable Computing Systems","authors":"Torben Kalkhof, Andreas Koch","doi":"10.1109/ICFPT52863.2021.9609831","DOIUrl":"https://doi.org/10.1109/ICFPT52863.2021.9609831","url":null,"abstract":"Shared Virtual Memory (SVM) can considerably simplify the application development for FPGA-accelerated computers, as it allows the seamless passing of virtually addressed pointers across the hardware/software boundary. Especially applications operating on complex pointer-based data structures can profit from this approach, as SVM can often avoid having to copy the entire data to FPGA memory, while performing pointer relocations in the process. Many FPGA-accelerated computers, especially in a data center setting, employ PCIe-attached boards that have FPGA-local memory in the form of on-chip HBM or on-board DRAM. Accesses to this local memory are much faster than going to the host memory via PCIe. Thus, even in the presence of SVM, it is desirable to be able to move the physical memory pages holding frequently accessed data closest to the compute unit that is operating on them. This capability is called physical page migration. The main contribution of this work is an open-source framework which provides SVM with physical page migration capabilities to PCIe-attached FPGA cards. We benchmark both fully automatic on-demand and user-managed explicit migration modes, and show that for suitable use-cases, the performance of migrations cannot just match that of conventional DMA copy-based accelerator operations, but may even exceed it by overlapping computations and migrations.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133087760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Resource-saving FPGA Implementation of the Satisfiability Problem Solver: AmoebaSATslim 可满足性问题求解器的FPGA实现:AmoebaSATslim

2021 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2021-12-06 DOI: 10.1109/ICFPT52863.2021.9609882

Yi Yan, H. Amano, M. Aono, Kaori Ohkoda, Shingo Fukuda, Kenta Saito, S. Kasai

{"title":"Resource-saving FPGA Implementation of the Satisfiability Problem Solver: AmoebaSATslim","authors":"Yi Yan, H. Amano, M. Aono, Kaori Ohkoda, Shingo Fukuda, Kenta Saito, S. Kasai","doi":"10.1109/ICFPT52863.2021.9609882","DOIUrl":"https://doi.org/10.1109/ICFPT52863.2021.9609882","url":null,"abstract":"The Boolean satisfiability problem (SAT) is an NP-complete combinatorial optimization problem, where fast SAT solvers are useful for various smart society applications. Since these edge-oriented applications require time-critical control, a high speed SAT solver on FPGA is a promising approach. Here the authors propose a novel FPGA implementation of a bio-inspired stochastic local search algorithm called ‘AmoebaSAT’ on a Zynq board. Previous studies on FPGA-AmoebaSATs tackled relatively smaller-sized 3-SAT instances with a few hundred variables and found the solutions in several milli seconds. These implementations, however, adopted an instance-specific approach, which requires synthesis of FPGA configuration every time when the targeted instance is altered. In this paper, a slimmed version of AmoebaSAT named ‘AmoebaSATslim,’ which omits the most resource-consuming part of interactions among variables, is proposed. The FPGA-AmoebaSATslim enables to tackle significantly larger-sized 3-SAT instances, accepting 30,000 variables with 130, 800 clauses. It achieves up to approximately 24 times faster execution speed than the software-AmoebaSATslim implemented on a CPU of the x86 server.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130786852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Profiling-Based Control-Flow Reduction in High-Level Synthesis 高级合成中基于剖面的控制流量减少

2021 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2021-12-06 DOI: 10.1109/ICFPT52863.2021.9609816

Austin Liolli, Omar Ragheb, J. Anderson

引用次数: 0

An area-efficient multiply-accumulation architecture and implementations for time-domain neural processing 时域神经处理的面积效率倍增累积架构与实现

2021 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2021-12-06 DOI: 10.1109/ICFPT52863.2021.9609809

Ichiro Kawashima, Yuichi Katori, T. Morie, H. Tamukoh

引用次数: 2

Increasing Memory Efficiency of Hash-Based Pattern Matching for High-Speed Networks 提高高速网络中基于哈希模式匹配的内存效率

2021 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2021-12-06 DOI: 10.1109/ICFPT52863.2021.9609859

Tomás Fukac, J. Matoušek, J. Korenek, Lukás Kekely

{"title":"Increasing Memory Efficiency of Hash-Based Pattern Matching for High-Speed Networks","authors":"Tomás Fukac, J. Matoušek, J. Korenek, Lukás Kekely","doi":"10.1109/ICFPT52863.2021.9609859","DOIUrl":"https://doi.org/10.1109/ICFPT52863.2021.9609859","url":null,"abstract":"Increasing speed of network links continuously pushes up requirements on the performance of network security and monitoring systems, including their typical representative and its core function: an intrusion detection system (IDS) and pattern matching. To allow the operation of IDS applications like Snort and Suricata in networks supporting throughput of 100Gbps or even more, a recently proposed pre-filtering architecture approximates exact pattern matching using hash-based matching of short strings that represent a given set of patterns. This architecture can scale supported throughput by adjusting the number of parallel hash functions and on-chip memory blocks utilized in the implementation of a hash table. Since each hash function can address every memory block, scaling throughput also increases the total capacity of the hash table. Nevertheless, the original architecture utilizes the available capacity of the hash table inefficiently. We therefore propose three optimization techniques that either reduce the amount of information stored in the hash table or increase its achievable occupancy. Moreover, we also design modifications of the architecture that enable resource-efficient utilization of all three optimization techniques together in synergy. Compared to the original pre-filtering architecture, combined use of the proposed optimizations in the 100Gbps scenario increases the achievable capacity for short strings by three orders of magnitude. It also reduces the utilization of FPGA logic resources to only a third.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121990730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A unified accelerator design for LiDAR SLAM algorithms for low-end FPGAs 用于低端fpga的激光雷达SLAM算法的统一加速器设计

2021 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2021-12-06 DOI: 10.1109/ICFPT52863.2021.9609886

K. Sugiura, Hiroki Matsutani

{"title":"A unified accelerator design for LiDAR SLAM algorithms for low-end FPGAs","authors":"K. Sugiura, Hiroki Matsutani","doi":"10.1109/ICFPT52863.2021.9609886","DOIUrl":"https://doi.org/10.1109/ICFPT52863.2021.9609886","url":null,"abstract":"A fast and reliable LiDAR (Light Detection and Ranging) SLAM (Simultaneous Localization and Mapping) system is the growing need for autonomous mobile robots, which are used for a variety of tasks such as indoor cleaning, navigation, and transportation. To bridge the gap between the limited processing power on such robots and the high computational requirement of the SLAM system, in this paper we propose a unified accelerator design for 2D SLAM algorithms on resource-limited FPGA devices. As scan matching is the heart of these algorithms, the proposed FPGA-based accelerator utilizes scan matching cores on the programmable logic part and users can switch the SLAM algorithms to adapt to performance requirements and environments without modifying and re-synthesizing the logic part. We integrate the accelerator into two representative SLAM algorithms, namely particle filter-based and graph-based SLAM. They are evaluated in terms of resource utilization, processing speed, and quality of output results with various real-world datasets, highlighting their algorithmic characteristics. Experiment results on a Pynq-Z2 board demonstrate that scan matching is accelerated by 13.67–14.84x, improving the overall performance of particle filter-based and graph-based SLAM by 4.03–4.67x and 3.09–4.00x respectively, while maintaining the accuracy comparable to their software counterparts and even state-of-the-art methods.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127075435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2