FPGA. ACM International Symposium on Field-Programmable Gate Arrays最新文献

筛选
英文 中文
High throughput and programmable online trafficclassifier on FPGA 基于FPGA的高吞吐量可编程在线流量分类器
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435307
Da Tong, Lu Sun, Kiran Kumar Matam, V. Prasanna
{"title":"High throughput and programmable online trafficclassifier on FPGA","authors":"Da Tong, Lu Sun, Kiran Kumar Matam, V. Prasanna","doi":"10.1145/2435264.2435307","DOIUrl":"https://doi.org/10.1145/2435264.2435307","url":null,"abstract":"Machine learning (ML) algorithms have been shown to be effective in classifying the dynamic internet traffic today. Using additional features and sophisticated ML techniques can improve accuracy and can classify a broad range of application classes. Realizing such classifiers to meet high data rates is challenging. In this paper, we propose two architectures to realize complete online traffic classifier using flow-level features. First, we develop a traffic classifier based on C4.5 decision tree algorithm and Entropy-MDL discretization algorithm. It achieves an accuracy of 97.92% when classifying a traffic trace consisting of eight application classes. Next, we accelerate our classifier using two architectures on FPGA. One architecture stores the classifier in on-chip distributed RAM. It is designed to sustain a high throughput. The other architecture stores the classifier in block RAM. It is designed to operate with small hardware footprint and thus built at low hardware cost. Experimental results show that our high throughput architecture can sustain a throughput of $550$ Gbps assuming 40 Byte packet size. Our low cost architecture demonstrates a 22% better resource efficiency than the high throughput design. It can be easily replicated to achieve $449$ Gbps while supporting 160 input traffic streams concurrently. Both architectures are parameterizable and programmable to support any binary-tree-based traffic classifier. We develop a tool which allows users to easily map a binary-tree-based classifier to hardware. The tool takes a classifier as input and automatically generates the Verilog code for the corresponding hardware architecture.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"79 1","pages":"255-264"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84805164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Improving bitstream compression by modifying FPGA architecture 通过修改FPGA结构改进比特流压缩
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435294
S. A. Razavi, M. S. Zamani
{"title":"Improving bitstream compression by modifying FPGA architecture","authors":"S. A. Razavi, M. S. Zamani","doi":"10.1145/2435264.2435294","DOIUrl":"https://doi.org/10.1145/2435264.2435294","url":null,"abstract":"The size of configuration bitstreams of field-programmable gate arrays (FPGA) is increasing rapidly. Compression techniques are used to decrease the size of bitstreams. In this paper, an appropriate bitstream format and variable symbol lengths are proposed to utilize the routing patterns for enhancing the compression efficiency. An order of inputs of multiplexers in switch modules is also proposed to improve the symbol statistics and hence, the compression efficiency. A framework to generate the bitstream and hardware description of FPGAs is developed as well. Experimental results over 20 MCNC benchmarks show that by applying the proposed approaches, the compression rate is improved by 46% on average compared to the methods with fixed symbol lengths without any area and performance degradation.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"1 1","pages":"167-170"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86492640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
High-level synthesis with LegUp: a crash course for users and researchers 高级合成与LegUp:为用户和研究人员的速成课程
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435269
J. Anderson, S. Brown, Andrew Canis, Jongsok Choi
{"title":"High-level synthesis with LegUp: a crash course for users and researchers","authors":"J. Anderson, S. Brown, Andrew Canis, Jongsok Choi","doi":"10.1145/2435264.2435269","DOIUrl":"https://doi.org/10.1145/2435264.2435269","url":null,"abstract":"High-level synthesis (HLS) has been gaining traction recently as a design methodology for FPGAs, with the promise of raising the productivity of FPGA hardware designers, and ultimately, opening the door to the use of FPGAs as computing devices targetable by software engineers. In this tutorial, we introduce LegUp, an open-source HLS tool for FPGAs developed at the University of Toronto. With LegUp, a user can compile a C program completely to hardware, or alternately, he/she can choose to compile the program to a hybrid hardware/software system comprising a processor along with one or more accelerators. LegUp supports the synthesis of most of the C language to hardware, including loops, structs, multi-dimensional arrays, pointer arithmetic, and floating point operations. The LegUp distribution includes the CHStone HLS benchmark suite, as well as a test suite and associated infrastructure for measuring quality of results, and for verifying the functionality of LegUp-generated circuits. LegUp is freely downloadable at www.legup.org, providing a powerful platform that can be leveraged for new high-level synthesis research.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"25 1","pages":"7-8"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82683345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Architecture support for custom instructions with memory operations 对带有内存操作的自定义指令的体系结构支持
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435303
J. Cong, Karthik Gururaj
{"title":"Architecture support for custom instructions with memory operations","authors":"J. Cong, Karthik Gururaj","doi":"10.1145/2435264.2435303","DOIUrl":"https://doi.org/10.1145/2435264.2435303","url":null,"abstract":"Customized instructions (CIs) implemented using custom functional units (CFUs) have been proposed as a way of improving performance and energy efficiency of software while minimizing cost of designing and verifying accelerators from scratch. However, previous work allows CIs to only communicate with the processor through registers or with limited memory operations. In this work we propose an architecture that allows CIs to seamlessly execute memory operations without any special synchronization operations to guarantee program order of instructions. Our results show that our architecture can provide 24% energy savings with 14% performance improvement for 2-issue and 4-issue superscalar processor cores.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"20 1","pages":"231-234"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81276494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Towards automatic customization of interconnect and memory in the CoRAM abstraction (abstract only) 在CoRAM抽象中实现互连和内存的自动定制(仅抽象)
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435311
Eric S. Chung, Michael Papamichael
{"title":"Towards automatic customization of interconnect and memory in the CoRAM abstraction (abstract only)","authors":"Eric S. Chung, Michael Papamichael","doi":"10.1145/2435264.2435311","DOIUrl":"https://doi.org/10.1145/2435264.2435311","url":null,"abstract":"When developing applications to run on FPGAs, we tend to expend great effort on crafting the custom hardware acceleration datapath---but blindly turn to the FPGA vendor tool/library to provide default solutions for on-chip interconnect and external interfaces. This often leads to ineffective communication- or memory-bound implementations since the design and tuning of the default general-purpose solutions necessarily makes design compromises for generality. This is counterproductive as the FPGA's flexible reconfigurability should afford us great opportunities for performance gain and cost reduction through extensive application-specific customization of the interconnect and interface IPs. This work presents a compiler that generates custom interconnect topology and connectivity with appropriately scaled capacity to support an application's exact communication requirements at a minimized cost. More specifically, the compiler analyzes an application developed for the CoRAM abstraction [1,2] for its connectivity and bandwidth requirements between the hardware processing kernels and external DRAM banks. The result is an extremely fine-tuned custom-topology soft-logic network-on-chip interconnect, which is enabled by the CONNECT NoC framework [3].\u0000 We perform an extensive evaluation that benchmarks two applications against the standard CoRAM implementation flow that relies on a fixed generically-tuned general-purpose soft-logic network-on-chip. Our RTL-driven evaluation shows a large opportunity for area reduction and improved efficiency (up by 48%) without any impact on application performance.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"23 1","pages":"265"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83762190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Word-length optimization beyond straight line code 字长优化超越直线代码
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435285
D. Boland, G. Constantinides
{"title":"Word-length optimization beyond straight line code","authors":"D. Boland, G. Constantinides","doi":"10.1145/2435264.2435285","DOIUrl":"https://doi.org/10.1145/2435264.2435285","url":null,"abstract":"The silicon area benefits that result from word-length optimization have been widely reported by the FPGA community. However, to date, most approaches are restricted to straight line code, or code that can be converted into straight line code using techniques such as loop-unrolling. In this paper, we take the first steps towards creating analytical techniques to optimize the precision used throughout custom FPGA accelerators for algorithms that contain loops with data dependent exit conditions. To achieve this, we build on ideas emanating from the software verification community to prove program termination. Our idea is to apply word-length optimization techniques to find the minimum precision required to guarantee that a loop with data dependent exit conditions will terminate. Without techniques to analyze algorithms containing these types of loops, a hardware designer may elect to implement every arithmetic operator throughout a custom FPGA-based accelerator using IEEE-754 standard single or double precision arithmetic. With this approach, the FPGA accelerator would have comparable accuracy to a software implementation. However, we show that using our new technique to create custom fixed and floating point designs, we can obtain silicon area savings of up to 50% over IEEE standard single precision arithmetic, or 80% over IEEE standard double precision arithmetic, at the same time as providing guarantees that the created hardware designs will work in practice.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"30 1","pages":"105-114"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84289593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Placement of repair circuits for in-field FPGA repair 现场FPGA修复修复电路的放置
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435286
M. Wirthlin, J. E. Jensen, Alex Wilson, W. Howes, Shi-Jie Wen, R. Wong
{"title":"Placement of repair circuits for in-field FPGA repair","authors":"M. Wirthlin, J. E. Jensen, Alex Wilson, W. Howes, Shi-Jie Wen, R. Wong","doi":"10.1145/2435264.2435286","DOIUrl":"https://doi.org/10.1145/2435264.2435286","url":null,"abstract":"With the growing density and shrinking feature size of modern semiconductors, it is increasingly difficult to manufacture defect free semiconductors that maintain acceptable levels of reliability for long periods of time. These systems are increasingly susceptible to wear-out by failing to meet their operational specifications for an extended period of time. The reconfigurability of FPGAs can be used to repair post-manufacturing faults by configuring the FPGA to avoid a damaged resource. This paper presents a method for repairing FPGA devices with wear-out faults by precomputing a set of repair circuits that, collectively, can repair a fault found in any logic block of the FPGA. This approach relies on logic placement to create \"repair\" circuits that avoid specific logic blocks. Three repair placement algorithms will be presented that generate a complete set of repair designs during the conventional placement process. The number of repairs needed to create a complete repair set depends heavily on the utilization of the FPGA resources. The three algorithms are tested against several benchmarks and with multiple area constraints for each benchmark. The best repair placement approach described in the paper generates a full set of repair circuits at a computation cost of 16X that of a conventional placer and with circuits of comparable quality.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"2 1","pages":"115-124"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87565234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A novel FPGA design framework with VLSI post-routing performance analysis (abstract only) 基于VLSI后路由性能分析的新型FPGA设计框架(仅摘要)
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435327
Qian Zhao, Kazuki Inoue, M. Amagasaki, M. Iida, M. Kuga, T. Sueyoshi
{"title":"A novel FPGA design framework with VLSI post-routing performance analysis (abstract only)","authors":"Qian Zhao, Kazuki Inoue, M. Amagasaki, M. Iida, M. Kuga, T. Sueyoshi","doi":"10.1145/2435264.2435327","DOIUrl":"https://doi.org/10.1145/2435264.2435327","url":null,"abstract":"The most widely used open-source field-programmable gate array (FPGA) placement and routing tool is VPR, which can define the target FPGA, perform placement and routing, and report area and timing information. However, it cannot be used in FPGA IP design efficiently for two reasons. First, for most newly developed FPGA architectures, VPR cannot support them directly. Modifying the C-coded VPR for using it to evaluate a number of new architectures requires a long time. Second, the accuracy of the VPR performance results is not enough for the evaluation of a complete synthesizable FPGA IP in the design that targets the productions of LSI. We propose a FPGA design framework that in particular improves FPGA IP design efficiency. A novel FPGA routing tool is developed in this framework, namely EasyRouter. EasyRouter is developed using the C# language. When an object-oriented programming method is used, the source codes are fewer and easier manage compared to VPR, which shortens the development time. By using simple HDL templates, EasyRouter can automatically generate entire chip HDL codes and the configuration bitstream. With these files, the FPGA IP can be evaluated with commercial VLSI CADs with high accuracy and reliability.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"7 1","pages":"271"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82698241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hardware implemented real-time operating system (abstract only) 硬件实现实时操作系统(抽象)
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435314
Soon Ee Ong, Siaw Chen Lee, N. Ali
{"title":"Hardware implemented real-time operating system (abstract only)","authors":"Soon Ee Ong, Siaw Chen Lee, N. Ali","doi":"10.1145/2435264.2435314","DOIUrl":"https://doi.org/10.1145/2435264.2435314","url":null,"abstract":"Real-Time Operating System (RTOS) usually implemented as software component at fundamental layer of embedded system which consumes computing time and memory resources. This will introduce extra overhead and latency to the system. In addition to this, the software layer of RTOS also indirectly raises the complexity of system software. Shifting RTOS from software to hardware is an inspiring idea to abstract RTOS layer out from the embedded system software framework. It has the advantages of helping to reduce the system software complexity, as well as improves the system performance by reducing overhead and latency of RTOS. This paper presented a Simple and Efficient hardware implemented Real-Time Operating System (SEOS) architected for high portability and scalability. SEOS operates at co-processor level as an independent hardware component. It contains all essential OS services needed for embedded system design. This includes kernel scheduler, inter-task communication and synchronization (i.e. mutex, semaphore, mailbox), timer and IRQ handler. The application software interfaces with SEOS through a set of standard Application Programming Interface (API). Furthermore, SEOS is also equipped with Generic Bus Interface and Interconnect Bridge to enable effortless OS porting across different processor platforms. These innovative approaches have made SEOS to be plug-and-play in nature. Our test result shows that SEOS is having performance improvement over commercial software based RTOS, µC/OS-II, in several areas. SEOS consumes 31.6% less overhead in context switching, improves task level interrupt latency by 83.5%, shorten inter-task communication latency by 71.9% and significantly improves on performance jitter.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"2004 1","pages":"266"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86263897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A high-performance, low-energy FPGA accelerator for correntropy-based feature tracking (abstract only) 一种高性能、低功耗的FPGA加速器,用于基于熵的特征跟踪
FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI: 10.1145/2435264.2435344
P. Cooke, J. Fowers, Lee Hunt, G. Stitt
{"title":"A high-performance, low-energy FPGA accelerator for correntropy-based feature tracking (abstract only)","authors":"P. Cooke, J. Fowers, Lee Hunt, G. Stitt","doi":"10.1145/2435264.2435344","DOIUrl":"https://doi.org/10.1145/2435264.2435344","url":null,"abstract":"Computer-vision and signal-processing applications often require feature tracking to identify and track the motion of different objects (features) across a sequence of images. Numerous algorithms have been proposed, but common measures of similarity for real-time usage are either based on correlation, mean-squared error, or sum of absolute differences, which are not robust enough for safety-critical applications. To improve robustness, a recent feature-tracking algorithm called C-Flow uses correntropy from Information Theoretic Learning to significantly improve signal-to-noise ratio. In this paper, we present an FPGA accelerator for C-Flow that is typically 3.6-8.5x faster than a GPU and show that the FPGA is the only device capable of real-time usage for large features. Furthermore, we show the FPGA accelerator is more appropriate for embedded usage, with energy consumption that is 2.5-22x less than the GPU.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"2 1","pages":"278"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90803534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信