2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines最新文献

An Efficient Architecture for Floating-Point Eigenvalue Decomposition 一种有效的浮点特征值分解结构

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.27

Xinying Wang, Joseph Zambreno

引用次数: 3

High-Throughput Fixed-Point Object Detection on FPGAs fpga的高通量定点目标检测

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.40

Xiaoyin Ma, W. Najjar, A. Roy-Chowdhury

{"title":"High-Throughput Fixed-Point Object Detection on FPGAs","authors":"Xiaoyin Ma, W. Najjar, A. Roy-Chowdhury","doi":"10.1109/FCCM.2014.40","DOIUrl":"https://doi.org/10.1109/FCCM.2014.40","url":null,"abstract":"Computer vision applications make extensive use of floating-point number representation, both single and double precision. The major advantage of floating-point representation is the very large range of values that can be represented with a limited number of bits. Most CPU, and all GPU designs have been extensively optimized for short latency and high-throughput processing of floating-point operations. On an FPGA, the bit-width of operands is a major determinant of its resource utilization, the achievable clock frequency and hence its throughput. By using a fixed-point representation with fewer bits, an application developer could implement more processing units and a higher-clock frequency and a dramatically larger throughput. However, smaller bit-widths may lead to inaccurate or incorrect results. Object and human detection are fundamental problems in computer vision and a very active research area. In these applications a high throughput and an economy of resources are highly desirable features allowing the applications to be embedded in mobile or fielddeployable equipment. The Histogram of Oriented Gradients (HOG) algorithm [1], developed for human detection and expanded to object detection, is one of the most successful and popular algorithm in its class. In this algorithm, object descriptors are extracted from detection window with grids of overlapping blocks. Each block is divided into cells in which histograms of intensity gradients are collected as HOG features. Vectors of histograms are normalized and passed to a Support Vector Machine (SVM) classifier to recognize a person or an object.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114568407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Better-Than-DMR Techniques for Yield Improvement 优于dmr的增产技术

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.21

S. Sanae, Yuko Hara-Azumi, S. Yamashita, Y. Nakashima

引用次数: 0

A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication 稀疏矩阵-矢量乘法的高存储带宽FPGA加速

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.23

J. Fowers, Kalin Ovtcharov, K. Strauss, Eric S. Chung, G. Stitt

{"title":"A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication","authors":"J. Fowers, Kalin Ovtcharov, K. Strauss, Eric S. Chung, G. Stitt","doi":"10.1109/FCCM.2014.23","DOIUrl":"https://doi.org/10.1109/FCCM.2014.23","url":null,"abstract":"Sparse matrix-vector multiplication (SMVM) is a crucial primitive used in a variety of scientific and commercial applications. Despite having significant parallelism, SMVM is a challenging kernel to optimize due to its irregular memory access characteristics. Numerous studies have proposed the use of FPGAs to accelerate SMVM implementations. However, most prior approaches focus on parallelizing multiply-accumulate operations within a single row of the matrix (which limits parallelism if rows are small) and/or make inefficient uses of the memory system when fetching matrix and vector elements. In this paper, we introduce an FPGA-optimized SMVM architecture and a novel sparse matrix encoding that explicitly exposes parallelism across rows, while keeping the hardware complexity and on-chip memory usage low. This system compares favorably with prior FPGA SMVM implementations. For the over 700 University of Florida sparse matrices we evaluated, it also performs within about two thirds of CPU SMVM performance on average, even though it has 2.4x lower DRAM memory bandwidth, and within almost one third of GPU SVMV performance on average, even at 9x lower memory bandwidth. Additionally, it consumes only 25W, for power efficiencies 2.6x and 2.3x higher than CPU and GPU, respectively, based on maximum device power.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133045431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 119

On Hard Adders and Carry Chains in FPGAs fpga中的硬加法器和进位链

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.25

J. Luu, Conor McCullough, Sen Wang, Safeen Huda, Bo Yan, Charles Chiasson, K. Kent, J. Anderson, Jonathan Rose, Vaughn Betz

{"title":"On Hard Adders and Carry Chains in FPGAs","authors":"J. Luu, Conor McCullough, Sen Wang, Safeen Huda, Bo Yan, Charles Chiasson, K. Kent, J. Anderson, Jonathan Rose, Vaughn Betz","doi":"10.1109/FCCM.2014.25","DOIUrl":"https://doi.org/10.1109/FCCM.2014.25","url":null,"abstract":"Hardened adder and carry logic is widely used in commercial FPGAs to improve the efficiency of arithmetic functions. There are many design choices and complexities associated with such hardening, including circuit design, FPGA architectural choices, and the CAD flow. There has been very little study, however, on these choices and hence we explore a number of possibilities for hard adder design. We also highlight optimizations during front-end elaboration that help ameliorate the restrictions placed on logic synthesis by hardened arithmetic. We show that hard adders and carry chains, when used for simple adders, increase performance by a factor of four or more, but on larger benchmark designs that contain arithmetic, improve overall performance by roughly 15%. We measure an average area increase of 5% for architectures with carry chains but believe that better logic synthesis should reduce this penalty. Interestingly, we show that adding dedicated inter-logic-block carry links or fast carry look-ahead hardened adders result in only minor delay improvements for complete designs.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124040923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

From GPU to FPGA: A Pipelined Hierarchical Approach to Fast and Memory-Efficient NDN Name Lookup 从GPU到FPGA:快速高效内存NDN名称查找的流水线分层方法

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.39

Yanbiao Li, Dafang Zhang, Xian Yu, Jing Long, W. Liang

引用次数: 1

GROK-INT: Generating Real On-Chip Knowledge for Interconnect Delays Using Timing Extraction GROK-INT:利用时序提取生成互连延迟的真实片上知识

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.31

Benjamin Gojman, A. DeHon

引用次数: 12

Harmonica: An FPGA-Based Data Parallel Soft Core 口琴:基于fpga的数据并行软核

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.53

C. Kersey, S. Yalamanchili, Hyojong Kim, Nimit Nigania, Hyesoon Kim

引用次数: 4

An Architectural Approach to Characterizing and Eliminating Sources of Inefficiency in a Soft Processor Design 描述和消除软处理器设计中低效率来源的体系结构方法

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.51

Kaveh Aasaraai, Andreas Moshovos

引用次数: 1

Memory Optimized Re-gridding for Non-uniform Fast Fourier Transform on FPGAs fpga上非均匀快速傅立叶变换的内存优化重网格

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2014-05-11 DOI: 10.1109/FCCM.2014.35

Umer I. Cheema, G. Nash, R. Ansari, A. Khokhar

引用次数: 0