2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)最新文献_第3页

OpenMP device offloading to FPGA accelerators OpenMP设备卸载到FPGA加速器

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-01 DOI: 10.1109/ASAP.2017.7995280

Lukas Sommer, Jens Korinth, A. Koch

引用次数: 45

DeepPump: Multi-pumping deep Neural Networks DeepPump:多泵深度神经网络

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-01 DOI: 10.1109/ASAP.2017.7995281

Ruizhe Zhao, T. Todman, W. Luk, Xinyu Niu

引用次数: 4

CFStore: Boosting Hybrid storage performance by device crossfire CFStore:通过设备交叉火力提升混合存储性能

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-01 DOI: 10.1109/ASAP.2017.7995265

Wei Zhou, D. Feng, Zhipeng Tan

引用次数: 1

Hardware-accelerated CCD readout smear correction for Fast Solar Polarimeter 用于快速太阳偏振计的硬件加速CCD读出涂抹校正

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-01 DOI: 10.1109/ASAP.2017.7995261

Stefan Tabel, Korbinian Weikl, W. Stechele

引用次数: 1

Modeling and evaluation for gather/scatter operations in Vector-SIMD architectures Vector-SIMD架构中聚集/分散操作的建模和评估

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-01 DOI: 10.1109/ASAP.2017.7995271

Hongbing Tan, Haiyan Chen, Sheng Liu, Jianguo Wu

引用次数: 2

Real-time object detection in software with custom vector instructions and algorithm changes 实时目标检测软件与自定义矢量指令和算法的变化

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-01 DOI: 10.1109/ASAP.2017.7995262

Joe Edwards, G. Lemieux

{"title":"Real-time object detection in software with custom vector instructions and algorithm changes","authors":"Joe Edwards, G. Lemieux","doi":"10.1109/ASAP.2017.7995262","DOIUrl":"https://doi.org/10.1109/ASAP.2017.7995262","url":null,"abstract":"Real-time vision applications place stringent performance requirements on embedded systems. To meet performance requirements, embedded systems often require hardware implementations. This approach is unfavorable as hardware development can be difficult to debug, time-consuming, and require extensive skill. This paper presents a case study of accelerating face detection, often part of a complex image processing pipeline, using a software/hardware hybrid approach. As a baseline, the algorithm is initially run on a scalar ARM Cortex-A9 application processor found on a Xilinx Zynq device. Next, using a previously designed vector engine implemented in the FPGA fabric, the algorithm is vectorized, using only standard vector instructions, to achieve a 25× speedup. Then, we accelerate the critical inner loops by adding two hardware-assisted custom vector instructions for an additional 10× speedup, yielding 248× speedup over the initial Cortex-A9 baseline. Collectively, the custom instructions require fewer than 800 lines of VHDL code, including comments and blank lines. Compared to previous hardware-only face detection systems, our work is 1.5 to 6.8 times faster. This approach demonstrates that good performance can be obtained from software-only vectorization, and a small amount of custom hardware can provide a significant acceleration boost.","PeriodicalId":405953,"journal":{"name":"2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126292217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

An embedded scalable linear model predictive hardware-based controller using ADMM 基于ADMM的嵌入式可扩展线性模型预测硬件控制器

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-01 DOI: 10.1109/ASAP.2017.7995276

Pei Zhang, Joseph Zambreno, Phillip H. Jones

{"title":"An embedded scalable linear model predictive hardware-based controller using ADMM","authors":"Pei Zhang, Joseph Zambreno, Phillip H. Jones","doi":"10.1109/ASAP.2017.7995276","DOIUrl":"https://doi.org/10.1109/ASAP.2017.7995276","url":null,"abstract":"Model predictive control (MPC) is a popular advanced model-based control algorithm for controlling systems that must respect a set of system constraints (e.g. actuator force limitations). However, the computing requirements of MPC limits the suitability of deploying its software implementation into embedded controllers requiring high update rates. This paper presents a scalable embedded MPC controller implemented on a field-programmable gate array (FPGA) coupled with an on-chip ARM processor. Our architecture implements an Alternating Direction Method of Multipliers (ADMM) approach for computing MPC controller commands. All computations are performed using floating-point arithmetic. We introduce a software/hardware (SW/HW) co-design methodology, for which the ARM software can configure on-chip Block RAM to allow users to 1) configure the MPC controller for a wide range of plants, and 2) update at runtime the desired trajectory to track. Our hardware architecture has the flexibility to compromise between the amount of hardware resources used (regarding Block RAMs and DSPs) and the controller computing speed. For example, this flexibility gives the ability to control plants modeled by a large number of decision variables (i.e. a plant model using many Block RAMs) with a small number of computing resources (i.e. DSPs) at the cost of increased computing time. The hardware controller is verified using a Plant-on-Chip (PoC), which is configured to emulate a mass-spring system in real-time. A major driving goal of this work is to architect an SW/HW platform that brings FPGAs a step closer to being widely adopted by advanced control algorithm designers for deploying their algorithms into embedded systems.","PeriodicalId":405953,"journal":{"name":"2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116885172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

A Staged Memory Resource Management Method for CMP systems 一种面向CMP系统的分级内存资源管理方法

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-01 DOI: 10.1109/ASAP.2017.7995264

Yangguo Liu, Junlin Lu, Dong Tong, Xu Cheng

引用次数: 1

High-throughput area-efficient processor for 3GPP LTE cryptographic core algorithms 用于3GPP LTE加密核心算法的高吞吐量区域高效处理器

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-01 DOI: 10.1109/ASAP.2017.7995285

Yuanhong Huo, Dake Liu

引用次数: 2

RVNet: A fast and high energy efficiency network packet processing system on RISC-V RVNet:基于RISC-V的快速高能效网络数据包处理系统

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2017-07-01 DOI: 10.1109/ASAP.2017.7995266

Yanpeng Wang, M. Wen, Chunyuan Zhang, Jie Lin

引用次数: 3