2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines最新文献

筛选
英文 中文
FPGA Gaussian Random Number Generators with Guaranteed Statistical Accuracy 具有统计精度保证的FPGA高斯随机数发生器
David B. Thomas
{"title":"FPGA Gaussian Random Number Generators with Guaranteed Statistical Accuracy","authors":"David B. Thomas","doi":"10.1109/FCCM.2014.47","DOIUrl":"https://doi.org/10.1109/FCCM.2014.47","url":null,"abstract":"Many types of stochastic algorithms, such as Monte-Carlo simulations and Bit-Error-Rate testing, require very high run-times and are often trivially parallelisable, so are natural candidates for execution using FPGAs. However, the applications are reliant on Gaussian Random Number Generators (GRNGs) with good statistical properties, as very small biases over trillions of random samples can lead to incorrect results. Previous hardware GRNGs have focussed on area-efficient algorithms to produce Gaussian distributions under idealised assumptions, but do not make statements about the actual distribution coming out of real fixed-point hardware. In this paper, we present a new type of GRNG called a Piecewise-CLT, which uses a weighted blend of many small smooth distributions to approximate the Gaussian. By adjusting the weights, it is possible to directly target the distribution of the Gaussian, resulting in a circuit with an exactly quantified output distribution. Three members of the PwCLT family are presented, ranging from medium-area with good quality, up to a generator providing guaranteed statistical accuracy out to 12-sigma. We also show that PwCLT provides a better area-accuracy tradeoff than all existing high-speed scalar FPGA GRNGs, and can provide extremely high levels of statistical quality not possible in any previous methods.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127760237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A Hierarchical Memory Architecture with NoC Support for MPSoC on FPGAs fpga上支持NoC的MPSoC分层存储器结构
Shiming Li, Miaoqing Huang, Hongyuan Ding, Sen Ma
{"title":"A Hierarchical Memory Architecture with NoC Support for MPSoC on FPGAs","authors":"Shiming Li, Miaoqing Huang, Hongyuan Ding, Sen Ma","doi":"10.1109/FCCM.2014.55","DOIUrl":"https://doi.org/10.1109/FCCM.2014.55","url":null,"abstract":"This work presents a memory hierarchy with the support of network-on-chip (NoC) for MPSoC systems. The memory hierarchy consists of a shared global memory and private local memories as shown in Figure 1. Each core in the system is equipped with two local memories, one for instructions and one for data. The MicroBlaze soft core used in this work connects the main bus through the PLB interface and connects the local memory modules through the LMB interface. Further it connects to a 4x4 mesh NoC through the FSL interface, as shown in Figure 2(a). We built the generic NoC (NoC-g) using the open-source router designed by the Concurrent VLSI Architecture group at the Stanford University [2]. Each router has 5 input ports and 5 output ports. Each input physical channel and each output physical channel is connected to 4 input virtual channels and 4 output virtual channels, respectively. The 40 virtual channels are connected to an internal crossbar switch for routing. We designed the adapter to connect the MicroBlaze processor to the router.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115047505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
FPGA Implementation of Optical Flow Algorithm Based on Cost Aggregation 基于成本聚合的光流算法的FPGA实现
Y. Tanabe, T. Maruyama
{"title":"FPGA Implementation of Optical Flow Algorithm Based on Cost Aggregation","authors":"Y. Tanabe, T. Maruyama","doi":"10.1109/FCCM.2014.57","DOIUrl":"https://doi.org/10.1109/FCCM.2014.57","url":null,"abstract":"The computational complexity of the optical flow estimation is very high, and many hardware systems have been proposed. In these systems, Lucas-Kanade, tensor-based, and phase-based method have been widely used. Census-transform, which is widely used in the stereo vision systems, was also implemented in several FPGA systems. In these systems, only one clock cycle is required for calculating one flow as their throughput, and their processing speed is fast enough for real-time processing of high resolution images. GPUs have also been used, and it was reported that the acceleration by FPGAs and GPUs is comparable[1][2]. The main problem in these systems is their low accuracy. The methods described above show high accuracy for the regions with high changes of brightness, but show poor results for uniform regions. This is the common problem with the stereo vision, and the approaches used in the stereo vision can be applied to the optical flow estimation. In this paper, we extend a cost aggregation algorithm[3] for the optical flow estimation, and implement it on FPGA.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133063503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Multi-phase Clock Time-to-Digital Convertor Based on ISERDES Architecture 基于ISERDES结构的多相时钟时间-数字转换器
Tian Xiang, Lei Zhao, Xi Jin, Tianqi Wang, S. Chu, C. Ma, Shubin Liu, Q. An, Xue Ben
{"title":"A Multi-phase Clock Time-to-Digital Convertor Based on ISERDES Architecture","authors":"Tian Xiang, Lei Zhao, Xi Jin, Tianqi Wang, S. Chu, C. Ma, Shubin Liu, Q. An, Xue Ben","doi":"10.1109/FCCM.2014.22","DOIUrl":"https://doi.org/10.1109/FCCM.2014.22","url":null,"abstract":"The time-to-digital converter(TDC) aims to mark an accurate timestamp at the time of input signal comes. The Multi-phase Clock sampling method is an usual way to map the TDC into an FPGA. Traditionally, this method provides a medium accuracy and low resources occupation. In this paper, we present a new architecture of TDC base on the 2-ISERDES in the SelectIO, rather than utilizing the Slice resources by the old way. The ISERDESes based TDC is equivalent to a 8 equidistant phase-shifted clocks TDC, with maximum clock frequency 900MHz. The least significant bit(LSB) is 139ps, which is 445% better than traditional architecture.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134297609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Timing Fault Detection in FPGA-Based Circuits 基于fpga电路的时序故障检测
Edward A. Stott, Joshua M. Levine, P. Cheung, Nachiket Kapre
{"title":"Timing Fault Detection in FPGA-Based Circuits","authors":"Edward A. Stott, Joshua M. Levine, P. Cheung, Nachiket Kapre","doi":"10.1109/FCCM.2014.32","DOIUrl":"https://doi.org/10.1109/FCCM.2014.32","url":null,"abstract":"The operation of FPGA systems, like most VLSI technology, is traditionally governed by static timing analysis, whereby safety margins for operating and manufacturing uncertainty are factored in at design-time. If we operate FPGA designs beyond these conservative margins we can obtain substantial energy and performance improvements. However, doing this carelessly would cause unacceptable impacts to reliability, lifespan and yield - issues which are growing more severe with continuing process scaling. Fortunately, the flexibility of FPGA architecture allows us to monitor and control reliability problems with a variety of runtime instrumentation and adaptation techniques. In this paper we develop a system for detecting timing faults in arbitrary FPGA circuits based on Razor-like shadow register insertion. Through a combination of calibration, timing constraint and adaptation of the CAD flow, we deliver low-overhead, trustworthy fault detection for FPGA-based circuits.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"47 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129360991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Abstract: Shared L2 Cache Management in Multicore Real-Time System 多核实时系统中的共享L2缓存管理
Gang Chen, Biao Hu, Kai Huang, A. Knoll, Di Liu
{"title":"Abstract: Shared L2 Cache Management in Multicore Real-Time System","authors":"Gang Chen, Biao Hu, Kai Huang, A. Knoll, Di Liu","doi":"10.1109/FCCM.2014.52","DOIUrl":"https://doi.org/10.1109/FCCM.2014.52","url":null,"abstract":"In multicore system, shared cache interference has been recognized as one of the major factors that degrade the average performance as well as predictability of system. How to manage the shared cache in order to optimize the system performance while guaranteeing the system predictability is still an open issue. State-of-the-art techniques on this topic use page coloring to partition the shared cache at OS level. In this paper, we present a shared cache management scheme for multicore system. This shared cache management scheme supports way-based cache partitioning at hardware level, building task-level time-triggered reconfigurable-cache multicore system. We evaluated the proposed scheme w.r.t. different numbers of cores and cache modules and prototyped the constructed MPSoCs on FPGA.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127542409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures 利用粗粒度可重构体系结构上嵌套循环的外循环并行性
Dajiang Liu, S. Yin, Leibo Liu, Shaojun Wei
{"title":"Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures","authors":"Dajiang Liu, S. Yin, Leibo Liu, Shaojun Wei","doi":"10.1109/FCCM.2014.19","DOIUrl":"https://doi.org/10.1109/FCCM.2014.19","url":null,"abstract":"A coarse-grained reconfigurable architecture is a promising architecture with high power efficiency, which is typically composed of a host controller and a processing element array (PEA). Loops are often mapped onto PEAs for acceleration. In previous work, innermost loop is pipelined, and the the maximal number of concurrently executable operators (CEOs) in the kernel is limited by the inner loop. The loop body DFG of the input 2D nested loop with a inner loop carried dependence ([0,1]) and outer loop carried dependence ([1,1]). We would map this loop onto a 4×4 PEA with pipelining. We assume that the latency of executing one loop iteration is Lb, and the number of iterations involved at one cycle in the kernel phase of pipelining is Wk. As there is a inner loop dependence ([0,1]), the initiation interval (IIi) of inner loop pipelining could be minimized to 1 and we get Wk = 4. We also note that the angle α is contained by two sides in Figure 1(b), which could be written as follow: tan(α) = Wk/Lb = 1/IIi.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128422938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FPGA Acceleration for Simultaneous Medical Image Reconstruction and Segmentation 同时医学图像重建和分割的FPGA加速
Peng Li, Thomas Page, Guojie Luo, Wentai Zhang, Pei Wang, Peng Zhang, P. Maass, M. Jiang, J. Cong
{"title":"FPGA Acceleration for Simultaneous Medical Image Reconstruction and Segmentation","authors":"Peng Li, Thomas Page, Guojie Luo, Wentai Zhang, Pei Wang, Peng Zhang, P. Maass, M. Jiang, J. Cong","doi":"10.1109/FCCM.2014.54","DOIUrl":"https://doi.org/10.1109/FCCM.2014.54","url":null,"abstract":"The conventional approach of computed tomography (CT) is to solve each image processing task individually in sequence. An obvious drawback is that the measured data is only used once at the first step, and the possible errors, from noises in the measured data, inappropriate modeling, or inappropriate parameters, are not easy to be corrected and will be propagated into the later steps. As a consequence, approaches that combine the reconstruction and the specific processing task have become popular. This work adopts an iterative algorithm with simultaneous reconstruction and segmentation using the Mumford-Shah model, which can be applied not only to regularize the ill-posedness of the tomographic reconstruction problem, but also to compute segmentation directly from the measured data. The Mumford-Shah model is both mathematically and computationally difficult. In this paper, we accelerated this computation and data intensive application by FPGA devices and achieved 9.24X speedup over the conventional CPU implementation.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126805169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Image Signal Processors on FPGAs fpga上的图像信号处理器
Di Wu, Andreas Moshovos
{"title":"Image Signal Processors on FPGAs","authors":"Di Wu, Andreas Moshovos","doi":"10.1109/FCCM.2014.58","DOIUrl":"https://doi.org/10.1109/FCCM.2014.58","url":null,"abstract":"An Image Signal Processor (ISP) converts raw imaging sensor data into a format appropriate for further processing and human inspection. This work explores FPGA-based ISP designs considering specialized and programmable implementations and proposes an ISP using a programmable generic processing unit with comparable performance versus the dedicated implementations.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115510570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerator of Stacked Convolutional Independent Subspace Analysis for Deep Learning-Based Action Recognition 基于深度学习的动作识别的堆叠卷积独立子空间分析加速器
Lu He, Yan Luo, Yu Cao
{"title":"Accelerator of Stacked Convolutional Independent Subspace Analysis for Deep Learning-Based Action Recognition","authors":"Lu He, Yan Luo, Yu Cao","doi":"10.1109/FCCM.2014.37","DOIUrl":"https://doi.org/10.1109/FCCM.2014.37","url":null,"abstract":"Action recognition has been a research challenge in multimedia computing and machine vision. Recent advances in deep learning combined with stacked convolutional Independent Subspace Analysis (ISA) has achieved a better performance superior to all previously published results on several public available data sets. Unfortunately, one major issue in large-scale deployment of this new deep learning-based approach is the unacceptable latency of training with high-dimension data. In this paper, we propose a new hardware accelerator that can reduce the training time substantially for deep learning-based action recognition. Specifically, our proposed approach focuses on accelerating the convolutional stacked ISA algorithm, the core components of the deep learning-based action recognition algorithms. We design parallel pipelines, data parallelisms and look-up table to speed up the algorithm. With an embedded heterogeneous platform consisting of a general purpose processor and a FPGA, we are able to achieve up to 10X speedup for stacked ISA training compared to a software-only implementation.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128588776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信