2022 International Conference on Field-Programmable Technology (ICFPT)最新文献_第3页

Application Specific Instruction-Set Processors for Machine Learning Applications 用于机器学习应用的专用指令集处理器

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI: 10.1109/ICFPT56656.2022.9974187

Muhammad Ali, D. Göhringer

{"title":"Application Specific Instruction-Set Processors for Machine Learning Applications","authors":"Muhammad Ali, D. Göhringer","doi":"10.1109/ICFPT56656.2022.9974187","DOIUrl":"https://doi.org/10.1109/ICFPT56656.2022.9974187","url":null,"abstract":"Machine learning algorithms are becoming more complicated with time in order to solve complex problems. This is creating a gap for embedded system solutions e.g. General-Purpose Processors (GPPs), Graphic Processing Units (GPUs), and hardware accelerators, for the machine learning algorithms. To bridge the gap between the available solutions, Application Specific Instruction-set Processors (ASIPs) are a promising solution. ASIPs are processor designs with a tailored architecture for a specific application. This allows a better efficiency (performance-to-power) ratio for the application ex-ecution. Furthermore, it adds more flexibility to the system as compared with hardware accelerators. The scope of this Ph. D. work is to develop a RISC-V-based ASIP for machine learning applications and explore the design space of the optimizations. RISC-V is an open-source Instruction-Set-Architecture (ISA) and allows the addition of custom application-specific instructions to the ISA. In the scope of this work three main design space optimization of ASIPs will be explored; specialized application-specific ISA, vector processing (for data-level parallelism), and multi-core architecture (for task-level parallelism). RISC- V 32-bit architecture is used as the base platform. For vector processing, RISC- V V-extension is utilized for a SIMD-based architecture called Vector Processing Unit (VPU) which is coupled with a 32-bit RISC- V host CPU. A modular memory system is implemented to have a shared (bus-based) and distributed (NoC- based) multi-core system. The memory system increases the flexibility and scalability of the system. Other known machine learning platforms are also explored and used as a comparison case.","PeriodicalId":239314,"journal":{"name":"2022 International Conference on Field-Programmable Technology (ICFPT)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116648805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

byteman: A Bitstream Manipulation Framework byteman:一个比特流操作框架

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI: 10.1109/ICFPT56656.2022.9974549

Kristiyan Manev, Joseph Powell, Kaspar Matas, Dirk Koch

引用次数: 0

EXPRESS: CNN EXecution Time PREdiction for DPU DeSign Space Exploration EXPRESS: DPU设计空间探索的CNN执行时间预测

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI: 10.1109/ICFPT56656.2022.9974299

Shikha Goel, Rajesh Kedia, Rijurekha Sen, M. Balakrishnan

引用次数: 0

A Highly Customizable and Efficient Hardware Implementation for Parallel Matrix Inversion 一个高度可定制和高效的并行矩阵反演硬件实现

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI: 10.1109/ICFPT56656.2022.9974569

Sultan S. Alqahtani, Yiqun Zhu, Qizhi Shi, Xiaolin Meng, Xinhua Wang

{"title":"A Highly Customizable and Efficient Hardware Implementation for Parallel Matrix Inversion","authors":"Sultan S. Alqahtani, Yiqun Zhu, Qizhi Shi, Xiaolin Meng, Xinhua Wang","doi":"10.1109/ICFPT56656.2022.9974569","DOIUrl":"https://doi.org/10.1109/ICFPT56656.2022.9974569","url":null,"abstract":"This paper introduces an efficient and customizable FPGA-based architecture for parallel matrix inversion. The capability of the proposed customizable architecture to adapt to different matrix sizes with low latency and effective resource utilization is achieved. The hardware resource usage is optimized by re-using the same multiplication units for different calculations. The architecture uses multiple multiplication units in parallel to perform the normalization step and then re-uses them for the elimination step. The performance of the proposed architecture is enhanced by maximizing parallelism and minimizing the sequential execution time of the division unit. Compared with other related works, the implementation results show that the proposed architecture is sufficiently flexible to support different matrix sizes with high parallel computing power. Additionally, the number of clock cycles and multiplication units of the proposed architecture is reduced proportionally to the increase in matrix size. The proposed architecture has been optimized for a Zynq xc7z045 FPGA and implemented using both single and double- precision floating-point representations.","PeriodicalId":239314,"journal":{"name":"2022 International Conference on Field-Programmable Technology (ICFPT)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133473726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

$p$LPAQ: Accelerating LPAQ Compression on FPGA 在FPGA上加速LPAQ压缩

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI: 10.1109/ICFPT56656.2022.9974593

Dongdong Tang, Xuan Sun, Nan Guan, Tei-Wei Kuo, C. Xue

引用次数: 0

Hardware SAT Solver-based Area-efficient Accelerator for Autonomous Driving 基于硬件SAT求解器的区域高效自动驾驶加速器

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI: 10.1109/ICFPT56656.2022.9974200

Yusuke Inuma, Yuko Hara-Azumi

引用次数: 0

CAPI-Precis: Towards a Compute-Centric Interface for Coherent Shared Memory Accelerators CAPI-Precis:迈向以计算为中心的相干共享内存加速器接口

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI: 10.1109/ICFPT56656.2022.9974504

A. Mughrabi, G. Byrd

引用次数: 0

Acceleration of Fast Sample Entropy Towards Biomedical Applications on FPGAs 加速快速样本熵在fpga上的生物医学应用

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI: 10.1109/ICFPT56656.2022.9974323

Chao Chen, B. Silva, Jianqing Li, Chengyu Liu

{"title":"Acceleration of Fast Sample Entropy Towards Biomedical Applications on FPGAs","authors":"Chao Chen, B. Silva, Jianqing Li, Chengyu Liu","doi":"10.1109/ICFPT56656.2022.9974323","DOIUrl":"https://doi.org/10.1109/ICFPT56656.2022.9974323","url":null,"abstract":"Sample Entropy (SampEn) is an information en-tropy algorithm widely used for complexity analysis and chaos estimation in many applications. In particular, SampEn measures complexity of time series by the conditional probability of the inner pattern. Unfortunately, the straightforward implementation of SampEn is quadratic time complexity, restricting its real-time analysis ability for health applications and long-term data analysis. Although researchers have proposed fast versions of SampEn to avoid unnecessary comparisons, they have not been accelerated yet due to their performance bottleneck in the complex similarity pair process. In this paper, we evaluate fast SampEn algorithms by employing multi-source biomedical signals on an Field-Programmable Gate Arrays (FPGA). Since fast SampEn algorithms based of a pre-sorting stage promise to outperform other SampEn algorithms, Lightweight SampEn based on Merge Sort is here implemented and optimized. Dif-ferent type of optimizations, that can be generalized for similar Lightweight-based SampEn algorithms, are used to reduce the overall latency while the data throughput is increased. A load balancing strategy for multi similarity pair modules is also proposed to solve the unbalancing loads, a bottleneck when increasing the execution parallelism of this type of algorithms. As a result, the proposed SampEn architecture runs 10 times faster than the fastest SampEn implementation on a modern CPU.","PeriodicalId":239314,"journal":{"name":"2022 International Conference on Field-Programmable Technology (ICFPT)","volume":"47 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132604682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

An Energy-Efficient K-means Clustering FPGA Accelerator via Most-Significant Digit First Arithmetic 基于最高有效位优先算法的高效K-means FPGA聚类加速器

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI: 10.1109/ICFPT56656.2022.9974222

S. Gorgin, M. Gholamrezaei, D. Javaheri, Jeong-A. Lee

引用次数: 0

Quality & Generality: A Flexible FPGA Re-Clustering Technique to Improve Packing and Placement 质量和通用性:一种灵活的FPGA重新聚类技术，以改善封装和放置

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI: 10.1109/ICFPT56656.2022.9974325

Mohamed A. Elgammal, Vaughn Betz

引用次数: 1