2022 International Conference on Field-Programmable Technology (ICFPT)最新文献_第5页

Area-Efficient Memory Scheduling for Dynamically Scheduled High-Level Synthesis 动态调度高级综合的区域高效内存调度

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI: 10.1109/ICFPT56656.2022.9974262

Xue-Xin He, Jianyi Cheng, G. Constantinides

{"title":"Area-Efficient Memory Scheduling for Dynamically Scheduled High-Level Synthesis","authors":"Xue-Xin He, Jianyi Cheng, G. Constantinides","doi":"10.1109/ICFPT56656.2022.9974262","DOIUrl":"https://doi.org/10.1109/ICFPT56656.2022.9974262","url":null,"abstract":"In high-level synthesis, scheduling maps operations into clock cycles. It can either be done at compile time (statically) or run time (dynamically). There has been recent interests in dynamic scheduling as it can potentially achieve a better performance. The state-of-the-art dynamically scheduled HLS tool Dynamatic generates dataflow-style hardware in a netlist of pre-defined components connected using handshake signals. The memory operations are executed by a component named load-store queue (LSQ), which can achieve run-time out-of-order memory accesses for high performance. However, the additional logic for the LSQ leads to significant area overhead compared to static scheduling. In this paper, we propose an area-efficient approach for scheduling memory operations at run time. We approximate the memory dependence distance to its minimal value and efficiently parallelise memory accesses in dynamically scheduled hardware. Over several benchmarks from related works, our results show that our approach achieves on average $0.2times$ of the area-delay product compared to the original designs using LSQs.","PeriodicalId":239314,"journal":{"name":"2022 International Conference on Field-Programmable Technology (ICFPT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130236761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient Reinforcement Learning Framework for Automated Logic Synthesis Exploration 用于自动逻辑综合探索的高效强化学习框架

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI: 10.1109/ICFPT56656.2022.9974330

Yu Qian, Xuegong Zhou, Hao Zhou, Lingli Wang

{"title":"Efficient Reinforcement Learning Framework for Automated Logic Synthesis Exploration","authors":"Yu Qian, Xuegong Zhou, Hao Zhou, Lingli Wang","doi":"10.1109/ICFPT56656.2022.9974330","DOIUrl":"https://doi.org/10.1109/ICFPT56656.2022.9974330","url":null,"abstract":"Logic synthesis is a crucial step in electronic design automation tools for integrated circuit design. In recent years, the development of reinforcement learning (RL) has enabled the designers to automatically explore the logic synthesis process. Existing RL based methods typically use conventional on-policy models, which leads to data inefficiency. Moreover, the exploration approach for FPGA technology mapping in recent works lacks the flexibility of the learning process. In this work, we propose ESE, a reinforcement learning based framework to efficiently learn the logic synthesis process. The framework supports the modeling for both the logic optimization and the FPGA technology mapping. The reward functions and terminal conditions in the RL environment are designed to efficiently guide the optimization of the metrics and execution time. For the modeling of FPGA mapping, the logic optimization and technology mapping are combined to be learned in a flexible way. Moreover, the Proximal Policy Optimization model is adopted to improve the utilization of samples. The proposed framework is evaluated on several common benchmarks. For the logic optimization on the EPFL benchmark, compared with previous works, the proposed method obtains an 11.3% improvement in the average quality (node-level-product) and reduces the execution time by 13.7%. For the FPGA technology mapping on the VTR benchmark, our method improves the average quality (LUT-level-product) by 14.8%, and reduces the execution time by 14.4% compared with the recent work.","PeriodicalId":239314,"journal":{"name":"2022 International Conference on Field-Programmable Technology (ICFPT)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127765163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Accelerating Transformer Neural Networks on FPGAs for High Energy Physics Experiments 用于高能物理实验的fpga加速变压器神经网络

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI: 10.1109/ICFPT56656.2022.9974463

Filip Wojcicki, Zhiqiang Que, A. Tapper, W. Luk

{"title":"Accelerating Transformer Neural Networks on FPGAs for High Energy Physics Experiments","authors":"Filip Wojcicki, Zhiqiang Que, A. Tapper, W. Luk","doi":"10.1109/ICFPT56656.2022.9974463","DOIUrl":"https://doi.org/10.1109/ICFPT56656.2022.9974463","url":null,"abstract":"High Energy Physics studies the fundamental forces and elementary particles of the Universe. With the unprecedented scale of experiments comes the challenge of accurate, ultra-low latency decision-making. Transformer Neural Networks (TNNs) have been proven to accomplish cutting-edge accuracy in classification for hadronic jet tagging. Nevertheless, software-centered solutions targeting CPUs and GPUs lack the inference speed required for real-time particle triggers, most notably those at the CERN Large Hadron Collider. This paper proposes a novel TNN-based architecture, efficiently mapped to Field-Programmable Gate Arrays, that outperforms GPU inference capabilities involving state-of-the-art neural network models by approximately 1000 times while preserving comparable classification accuracy. The design offers high customizability and aims to bridge the gap between hardware and software development by using High-Level Synthesis. Moreover, we propose a novel model-independent post-training quantization search algorithm that works in general hardware environments according to user-defined constraints. Experimental evaluation yields a 64% reduction in overall bit-widths with a 2% accuracy loss.","PeriodicalId":239314,"journal":{"name":"2022 International Conference on Field-Programmable Technology (ICFPT)","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121093096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Cloning the Unclonable: Physically Cloning an FPGA Ring-Oscillator PUF 克隆不可克隆:物理克隆FPGA环形振荡器PUF

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI: 10.1109/ICFPT56656.2022.9974597

Hayden Cook, Jonathan Thompson, Zephram Tripp, B. Hutchings, Jeffrey B. Goeders

{"title":"Cloning the Unclonable: Physically Cloning an FPGA Ring-Oscillator PUF","authors":"Hayden Cook, Jonathan Thompson, Zephram Tripp, B. Hutchings, Jeffrey B. Goeders","doi":"10.1109/ICFPT56656.2022.9974597","DOIUrl":"https://doi.org/10.1109/ICFPT56656.2022.9974597","url":null,"abstract":"This work presents a novel technique to physically clone a ring oscillator physically unclonable function (RO PDF) onto another distinct FPG A die, using precise, targeted aging. The resulting cloned RO PDF provides a response that is identical to its copied FPGA counterpart, i.e., the FPGA and its clone are indistinguishable from each other. Targeted aging is achieved by: 1) heating the FPGA using bitstream-Iocated short circuits, and 2) enabling/disabling ROs in the same FPGA bitstream. During self heating caused by short-circuits contained in the FPGA bitstream, circuit areas containing oscillating ROs (enabled) degrade more slowly than circuit areas containing non-oscillating ROs (disabled), due to bias temperature instability effects. This targeted aging technique is used to swap the relative frequencies of two ROs that will, in turn, flip the corresponding bit in the PUF response. Two experiments are described. The first experiment uses targeted aging to create an FPGA that exhibits the same PUF response as another FPGA, i.e., a clone of an FPGA PUF onto another FPGA device. The second experiment demonstrates that this aging technique can create an RO PUF with any desired response.","PeriodicalId":239314,"journal":{"name":"2022 International Conference on Field-Programmable Technology (ICFPT)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115278183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Modeling FPGA-based Architectures for Robotics 基于fpga的机器人体系结构建模

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI: 10.1109/ICFPT56656.2022.9974412

Ariel Podlubne, D. Göhringer

引用次数: 0

GraFF: A Multi-FPGA System with Memory Semantic Fabric for Scalable Graph Processing 基于记忆语义结构的多fpga可扩展图形处理系统

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI: 10.1109/ICFPT56656.2022.9974189

Xu Zhang, Yisong Chang, Tianyue Lu, Ke Liu, Ke Zhang, Mingyu Chen

引用次数: 0

FPT 22 on Site Proceedings FPT 22现场程序

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI: 10.1109/ICFPT56656.2022.9974397

引用次数: 0

Leveraging FPGA Primitives to Improve Word Reconstruction during Netlist Reverse Engineering 利用FPGA基元改善网表逆向工程中的字重构

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI: 10.1109/ICFPT56656.2022.9974401

Reilly McKendrick, Corey Simpson, B. Nelson, Jeffrey B. Goeders

引用次数: 1

Load-Store Queue Sizing for Efficient Dataflow Circuits 高效数据流电路的负载存储队列大小

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI: 10.1109/ICFPT56656.2022.9974425

Jiantao Liu, Carmine Rizzi, Lana Josipović

{"title":"Load-Store Queue Sizing for Efficient Dataflow Circuits","authors":"Jiantao Liu, Carmine Rizzi, Lana Josipović","doi":"10.1109/ICFPT56656.2022.9974425","DOIUrl":"https://doi.org/10.1109/ICFPT56656.2022.9974425","url":null,"abstract":"Dataflow circuits implement dynamic scheduling and have recently been explored as an alternative to standard, statically scheduled high-level synthesis (HLS) solutions. In contrast to static HLS, dataflow circuits resolve memory dependencies during runtime by employing load-store queues (LSQs) at the memory interface. However, LSQs are extremely resource-expensive to implement in a spatial system and may cause notable frequency degradation. Therefore, there is a clear need to minimize their size and complexity, while still allowing the circuit to achieve a high computational rate. So far, designers resorted to manually tuning the LSQ depth (i.e., number of queue entries) to trade off area and performance; yet, this approach is evidently time-consuming and unfeasible for complex designs. In this work, we develop a strategy to automatically determine the most affordable LSQ depths in dataflow circuits while maintaining the best possible circuit throughput. We demonstrate our technique on benchmarks obtained from C code with different memory access patterns and show that it can effectively produce the desired Pareto-optimal design points.","PeriodicalId":239314,"journal":{"name":"2022 International Conference on Field-Programmable Technology (ICFPT)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133601743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

FPGA Implementation of Low-Latency Recursive Median Filter 低延迟递归中值滤波器的FPGA实现

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI: 10.1109/ICFPT56656.2022.9974273

Bo Peng, Yuzhu Zhou, Qiang Li, Maosong Lin, Jiankui Weng, Qiang Zeng

引用次数: 1