2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines最新文献

筛选
英文 中文
Kung Fu Data Energy - Minimizing Communication Energy in FPGA Computations 功夫数据能量-最小化FPGA计算中的通信能量
E. Kadrić, K. Mahajan, A. DeHon
{"title":"Kung Fu Data Energy - Minimizing Communication Energy in FPGA Computations","authors":"E. Kadrić, K. Mahajan, A. DeHon","doi":"10.1109/FCCM.2014.66","DOIUrl":"https://doi.org/10.1109/FCCM.2014.66","url":null,"abstract":"The energy in FPGA computations can be dominated by data communication energy, either in the form of memory references or data movement on interconnect (e.g., over 75% of energy for single processor Gaussian Mixture Modeling, Window Filtering, and FFT). In this paper, we explore how to use data placement and parallelism to reduce communication energy. We further introduce a new architecture for embedded memories, the Continuous Hierarchy Memory (CHM), and show that it increases the opportunities to reduce energy by strategic data placement. For three common FPGA tasks in signal and image processing (Gaussian Mixture Modeling, Window Filters, and FFTs), we show that data movement energy can vary over a factor of 9. The best solutions exploit parallelism and hierarchy and are 1.8-6.0× more energy-efficient than designs that place all data in a large memory bank. With the CHM, we can get an additional 10% improvement for full voltage logic and 30-80% when operating the computation at reduced voltage.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116010305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
A Fully Pipelined and Dynamically Composable Architecture of CGRA 全流水线、动态可组合的CGRA体系结构
J. Cong, Hui Huang, Chiyuan Ma, Bingjun Xiao, Peipei Zhou
{"title":"A Fully Pipelined and Dynamically Composable Architecture of CGRA","authors":"J. Cong, Hui Huang, Chiyuan Ma, Bingjun Xiao, Peipei Zhou","doi":"10.1109/FCCM.2014.12","DOIUrl":"https://doi.org/10.1109/FCCM.2014.12","url":null,"abstract":"Future processor chips will not be limited by the transistor resources, but will be mainly constrained by energy efficiency. Reconfigurable fabrics bring higher energy efficiency than CPUs via customized hardware that adapts to user applications. Among different reconfigurable fabrics, coarse-grained reconfigurable arrays (CGRAs) can be even more efficient than fine-grained FPGAs when bit-level customization is not necessary in target applications. CGRAs were originally developed in the era when transistor resources were more critical than energy efficiency. Previous work shares hardware among different operations via modulo scheduling and time multiplexing of processing elements. In this work, we focus on an emerging scenario where transistor resources are rich. We develop a novel CGRA architecture that enables full pipelining and dynamic composition to improve energy efficiency by taking full advantage of abundant transistors. Several new design challenges are solved. We implement a prototype of the proposed architecture in a commodity FPGA chip for verification. Experiments show that our architecture can fully exploit the energy benefits of customization for user applications in the scenario of rich transistor resources.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125208214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 73
Fast, Power-Efficient Biophotonic Simulations for Cancer Treatment Using FPGAs 利用fpga快速、高效的生物光子模拟癌症治疗
Jeffrey Cassidy, L. Lilge, Vaughn Betz
{"title":"Fast, Power-Efficient Biophotonic Simulations for Cancer Treatment Using FPGAs","authors":"Jeffrey Cassidy, L. Lilge, Vaughn Betz","doi":"10.1109/FCCM.2014.45","DOIUrl":"https://doi.org/10.1109/FCCM.2014.45","url":null,"abstract":"Biophotonics, the study of light propagation through living tissue, is important for many medical applications ranging from imaging and detection through therapy for conditions such as cancer. Effective medical use of light depends on simulating its propagation through highly-scattering tissue. Monte Carlo simulation of photon migration has been adopted as the “gold standard” for its ability to capture complicated geometries and model all of the relevant problem physics. This accuracy and generality comes at a high computational cost, which limits the technique's utility. Greatly generalizing previous work, we present the first and only hardware-accelerated Monte Carlo biophotonic simulator that can accept complicated geometries described by tetrahedral meshes. Implemented on an Altera Stratix V FPGA, it achieves high performance (4x) and extremely high energy efficiency (67x) compared to a tightly-optimized multi-threaded CPU implementation, with demonstrated potential to expand the performance gains even further to 15-20x, which would enable important clinical and research applications.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124334833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Compiling Higher Order Functional Programs to Composable Digital Hardware 编译高阶函数程序到可组合的数字硬件
E. Aguilar-Pelaez, Samuel Bayliss, Alex I. Smith, F. Winterstein, D. Ghica, David B. Thomas, G. Constantinides
{"title":"Compiling Higher Order Functional Programs to Composable Digital Hardware","authors":"E. Aguilar-Pelaez, Samuel Bayliss, Alex I. Smith, F. Winterstein, D. Ghica, David B. Thomas, G. Constantinides","doi":"10.1109/FCCM.2014.69","DOIUrl":"https://doi.org/10.1109/FCCM.2014.69","url":null,"abstract":"This work demonstrates the capabilities of a high-level synthesis tool-chain that allows the compilation of higher order functional programs to gate-level hardware descriptions. Higher order programming allows functions to take functions as parameters. In a hardware context, the latency-insensitive interfaces generated between compiled modules enable late-binding with libraries of pre-existing functions at the place-and-route compilation stage. We demonstrate the completeness and utility of our approach using a case study; a recursive k-means clustering algorithm. The algorithm features complex data-dependent control flow and opportunities to exploit both coarse and fine-grained parallelism.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125089477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A New Algorithm for Carry-Free Addition of Binary Signed-Digit Numbers 二进制有符号数字的无携带加法新算法
K. Schneider, Adrian Willenbücher
{"title":"A New Algorithm for Carry-Free Addition of Binary Signed-Digit Numbers","authors":"K. Schneider, Adrian Willenbücher","doi":"10.1109/FCCM.2014.24","DOIUrl":"https://doi.org/10.1109/FCCM.2014.24","url":null,"abstract":"Signed-digit (SD) numbers generalize traditional radix numbers by allowing negative digits within a certain range. Typically, this leads to redundant number representations that can be used to avoid the carry propagation problem of addition of radix numbers. Unfortunately, as proved by Avizienis, the standard algorithm for carry-free addition of SD numbers does not work for the binary case. In this paper, we therefore construct a special algorithm for the carry-free addition and subtraction of binary SD numbers, i.e., addition and subtraction of n-digit numbers are performed with circuits of depth O(1) and size O(n). This is possible by computing in addition to the transfer digits used by the standard algorithm one additional bit that allows us to distinguish relevant cases to avoid propagation of dependencies. The additional bit and the transfer digit used to compute the sum digit at position i depend only on the summands' digits at positions i and i - 1 so that all sum digits can be computed with a hardware circuit of a depth that is independent of the number of digits. We first explain the basics of the standard addition algorithm to derive the additional information needed to fix the algorithm for the binary case. After proving the correctness of our algorithm, we present experimental results that show that our implementation clearly outperforms two's complement addition even for small numbers, and saves 50% of the required chip area compared to other carry-free implementations.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116557388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Experiments in Mapping Expressions to DSP Blocks 将表达式映射到DSP块的实验
Bajaj Ronak, Suhaib A. Fahmy
{"title":"Experiments in Mapping Expressions to DSP Blocks","authors":"Bajaj Ronak, Suhaib A. Fahmy","doi":"10.1109/FCCM.2014.34","DOIUrl":"https://doi.org/10.1109/FCCM.2014.34","url":null,"abstract":"Mapping complex mathematical expressions to DSP blocks by relying on synthesis from pipelined code is inefficient and results in significantly reduced throughput. We have developed a tool to demonstrate the benefit of considering the structure and pipeline arrangement of the DSP block in mapping of functions. Implementations where the structure of the DSP block is considered during pipelining achieve double the throughput of other methods, demonstrating that the structure of the DSP block must be considered when scheduling complex expressions.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132194302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
FPGA Implementation of EM Algorithm for 3D CT Reconstruction 三维CT重建中EM算法的FPGA实现
Young-kyu Choi, J. Cong, Di Wu
{"title":"FPGA Implementation of EM Algorithm for 3D CT Reconstruction","authors":"Young-kyu Choi, J. Cong, Di Wu","doi":"10.1109/FCCM.2014.48","DOIUrl":"https://doi.org/10.1109/FCCM.2014.48","url":null,"abstract":"Although the expectation maximization (EM)based 3D computed tomography (CT) reconstruction algorithm lowers radiation exposure, its long execution time hinders practical usage. To accelerate this process, we introduce a novel external memory bandwidth reduction strategy by reusing both the sinogram and the voxel intensity. Also, a customized computing engine based on field-programmable gate array (FPGA) is presented to increase the effective memory bandwidth. Experiments on actual patient data show that 85X speedup can be achieved over single-threaded CPU.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130555550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Mapping Tasks to a Dynamically Reconfigurable Coarse Grained Array 将任务映射到动态可重构的粗粒度数组
M. S. Moghaddam, K. Paul, M. Balakrishnan
{"title":"Mapping Tasks to a Dynamically Reconfigurable Coarse Grained Array","authors":"M. S. Moghaddam, K. Paul, M. Balakrishnan","doi":"10.1109/FCCM.2014.20","DOIUrl":"https://doi.org/10.1109/FCCM.2014.20","url":null,"abstract":"Coarse-Grained Reconfigurable Architectures (CGRAs) have become popular in recent times as the increased transistor densities have enabled greater integration of increasingly complex “compute cores”. These devices pack massive compute power and can be effectively used to build efficient solutions for applications which have a significant degree of parallelism. In many cases, these CGRAs are also partially reconfigurable. Clearly to make effective use of these highly “parallel compute platforms”, a good mapping flow is required to map the parallelism that is present in a target application.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122125006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GraphGen: An FPGA Framework for Vertex-Centric Graph Computation GraphGen:一个以顶点为中心的图形计算的FPGA框架
E. Nurvitadhi, G. Weisz, Yu Wang, Skand Hurkat, Marie Nguyen, J. Hoe, José F. Martínez, Carlos Guestrin
{"title":"GraphGen: An FPGA Framework for Vertex-Centric Graph Computation","authors":"E. Nurvitadhi, G. Weisz, Yu Wang, Skand Hurkat, Marie Nguyen, J. Hoe, José F. Martínez, Carlos Guestrin","doi":"10.1109/FCCM.2014.15","DOIUrl":"https://doi.org/10.1109/FCCM.2014.15","url":null,"abstract":"Vertex-centric graph computations are widely used in many machine learning and data mining applications that operate on graph data structures. This paper presents GraphGen, a vertex-centric framework that targets FPGA for hardware acceleration of graph computations. GraphGen accepts a vertex-centric graph specification and automatically compiles it onto an application-specific synthesized graph processor and memory system for the target FPGA platform. We report design case studies using GraphGen to implement stereo matching and handwriting recognition graph applications on Terasic DE4 and Xilinx ML605 FPGA boards. Results show up to 14.6× and 2.9× speedups over software on Intel Core i7 CPU for the two applications, respectively.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126181955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 115
Building Optimized Packet Filters with COFFi 用COFFi构建优化包过滤器
Sven Hager, F. Winkler, B. Scheuermann, Klaus Reinhardt
{"title":"Building Optimized Packet Filters with COFFi","authors":"Sven Hager, F. Winkler, B. Scheuermann, Klaus Reinhardt","doi":"10.1109/FCCM.2014.38","DOIUrl":"https://doi.org/10.1109/FCCM.2014.38","url":null,"abstract":"Many companies and institutions employ packet filter firewalls in order to effectively regulate network traffic. Unfortunately, the constant growth of network bandwidth makes the task of matching packet headers against potentially large rulesets more difficult, and prohibits the sole use of entirely software-based firewalls which cannot cope with such huge amounts of traffic. Instead, high-speed firewalls are often implemented in ASICs which offer a high degree of parallelism, many opportunities for operation pipelining, and low-latency access to network data. However, due to their static nature, ASICs must provide generic filtering circuitry that is hardly able to take full advantage of firewall ruleset properties, thus leading to a waste of hardware resources.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129301929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信