2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines最新文献_第4页

FPGA Gaussian Random Number Generators with Guaranteed Statistical Accuracy 具有统计精度保证的FPGA高斯随机数发生器

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.47

David B. Thomas

{"title":"FPGA Gaussian Random Number Generators with Guaranteed Statistical Accuracy","authors":"David B. Thomas","doi":"10.1109/FCCM.2014.47","DOIUrl":"https://doi.org/10.1109/FCCM.2014.47","url":null,"abstract":"Many types of stochastic algorithms, such as Monte-Carlo simulations and Bit-Error-Rate testing, require very high run-times and are often trivially parallelisable, so are natural candidates for execution using FPGAs. However, the applications are reliant on Gaussian Random Number Generators (GRNGs) with good statistical properties, as very small biases over trillions of random samples can lead to incorrect results. Previous hardware GRNGs have focussed on area-efficient algorithms to produce Gaussian distributions under idealised assumptions, but do not make statements about the actual distribution coming out of real fixed-point hardware. In this paper, we present a new type of GRNG called a Piecewise-CLT, which uses a weighted blend of many small smooth distributions to approximate the Gaussian. By adjusting the weights, it is possible to directly target the distribution of the Gaussian, resulting in a circuit with an exactly quantified output distribution. Three members of the PwCLT family are presented, ranging from medium-area with good quality, up to a generator providing guaranteed statistical accuracy out to 12-sigma. We also show that PwCLT provides a better area-accuracy tradeoff than all existing high-speed scalar FPGA GRNGs, and can provide extremely high levels of statistical quality not possible in any previous methods.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127760237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

A Hierarchical Memory Architecture with NoC Support for MPSoC on FPGAs fpga上支持NoC的MPSoC分层存储器结构

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.55

Shiming Li, Miaoqing Huang, Hongyuan Ding, Sen Ma

引用次数: 2

FPGA Implementation of Optical Flow Algorithm Based on Cost Aggregation 基于成本聚合的光流算法的FPGA实现

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.57

Y. Tanabe, T. Maruyama

引用次数: 1

A Multi-phase Clock Time-to-Digital Convertor Based on ISERDES Architecture 基于ISERDES结构的多相时钟时间-数字转换器

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.22

Tian Xiang, Lei Zhao, Xi Jin, Tianqi Wang, S. Chu, C. Ma, Shubin Liu, Q. An, Xue Ben

引用次数: 3

Timing Fault Detection in FPGA-Based Circuits 基于fpga电路的时序故障检测

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.32

Edward A. Stott, Joshua M. Levine, P. Cheung, Nachiket Kapre

引用次数: 22

Abstract: Shared L2 Cache Management in Multicore Real-Time System 多核实时系统中的共享L2缓存管理

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.52

Gang Chen, Biao Hu, Kai Huang, A. Knoll, Di Liu

引用次数: 1

Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures 利用粗粒度可重构体系结构上嵌套循环的外循环并行性

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.19

Dajiang Liu, S. Yin, Leibo Liu, Shaojun Wei

引用次数: 0

FPGA Acceleration for Simultaneous Medical Image Reconstruction and Segmentation 同时医学图像重建和分割的FPGA加速

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.54

Peng Li, Thomas Page, Guojie Luo, Wentai Zhang, Pei Wang, Peng Zhang, P. Maass, M. Jiang, J. Cong

引用次数: 6

Image Signal Processors on FPGAs fpga上的图像信号处理器

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-01 DOI: 10.1109/FCCM.2014.58

Di Wu, Andreas Moshovos

引用次数: 0

Accelerator of Stacked Convolutional Independent Subspace Analysis for Deep Learning-Based Action Recognition 基于深度学习的动作识别的堆叠卷积独立子空间分析加速器

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-01 DOI: 10.1109/FCCM.2014.37

Lu He, Yan Luo, Yu Cao

{"title":"Accelerator of Stacked Convolutional Independent Subspace Analysis for Deep Learning-Based Action Recognition","authors":"Lu He, Yan Luo, Yu Cao","doi":"10.1109/FCCM.2014.37","DOIUrl":"https://doi.org/10.1109/FCCM.2014.37","url":null,"abstract":"Action recognition has been a research challenge in multimedia computing and machine vision. Recent advances in deep learning combined with stacked convolutional Independent Subspace Analysis (ISA) has achieved a better performance superior to all previously published results on several public available data sets. Unfortunately, one major issue in large-scale deployment of this new deep learning-based approach is the unacceptable latency of training with high-dimension data. In this paper, we propose a new hardware accelerator that can reduce the training time substantially for deep learning-based action recognition. Specifically, our proposed approach focuses on accelerating the convolutional stacked ISA algorithm, the core components of the deep learning-based action recognition algorithms. We design parallel pipelines, data parallelisms and look-up table to speed up the algorithm. With an embedded heterogeneous platform consisting of a general purpose processor and a FPGA, we are able to achieve up to 10X speedup for stacked ISA training compared to a software-only implementation.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128588776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3