2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)最新文献

筛选
英文 中文
An Out-of-Order Load-Store Queue for Spatial Computing 空间计算中的乱序负载存储队列
Lana Josipović, P. Brisk, P. Ienne
{"title":"An Out-of-Order Load-Store Queue for Spatial Computing","authors":"Lana Josipović, P. Brisk, P. Ienne","doi":"10.1145/3126525","DOIUrl":"https://doi.org/10.1145/3126525","url":null,"abstract":"The efficiency of spatial computing depends onthe ability to achieve maximal parallelism. This needs memoryinterfaces that can correctly handle memory accesses arrivingin arbitrary order while still respecting data dependencies andensuring appropriate ordering for semantic correctness. However, a typical memory interface for out-of-order processors (i.e., aload-store queue) cannot immediately fulfill these requirements:a different allocation policy is needed to achieve out-of-orderexecution in a spatial system. We show a practical way toorganize the allocation for an out-of-order load-store queue forspatial computing by dynamically allocating groups of memoryaccesses, where the access order within the group is staticallypredetermined (for instance by a high-level synthesis tool).","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122583414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Escher: A CNN Accelerator with Flexible Buffering to Minimize Off-Chip Transfer Escher:一个具有灵活缓冲的CNN加速器,以最大限度地减少片外传输
Yongming Shen, M. Ferdman, Peter Milder
{"title":"Escher: A CNN Accelerator with Flexible Buffering to Minimize Off-Chip Transfer","authors":"Yongming Shen, M. Ferdman, Peter Milder","doi":"10.1109/FCCM.2017.47","DOIUrl":"https://doi.org/10.1109/FCCM.2017.47","url":null,"abstract":"Convolutional neural networks (CNNs) are used to solve many challenging machine learning problems. Interest in CNNs has led to the design of CNN accelerators to improve CNN evaluation throughput and efficiency. Importantly, the bandwidth demand from weight data transfer for modern large CNNs causes CNN accelerators to be severely bandwidth bottlenecked, prompting the need for processing images in batches to increase weight reuse. However, existing CNN accelerator designs limit the choice of batch sizes and lack support for batch processing of convolutional layers. We observe that, for a given storage budget, choosing the best batch size requires balancing the input and weight transfer. We propose Escher, a CNN accelerator with a flexible data buffering scheme that ensures a balance between the input and weight transfer bandwidth, significantly reducing overall bandwidth requirements. For example, compared to the state-of-the-art CNN accelerator designs targeting a Virtex-7 690T FPGA, Escher reduces the accelerator peak bandwidth requirements by 2.4x across both fully-connected and convolutional layers on fixed-point AlexNet, and reduces convolutional layer bandwidth by up to 10.5x on fixed-point GoogleNet.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121232190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 95
High-Performance Hardware Merge Sorter 高性能硬件合并排序器
Susumu Mashimo, Thiem Van Chu, Kenji Kise
{"title":"High-Performance Hardware Merge Sorter","authors":"Susumu Mashimo, Thiem Van Chu, Kenji Kise","doi":"10.1109/FCCM.2017.19","DOIUrl":"https://doi.org/10.1109/FCCM.2017.19","url":null,"abstract":"State-of-the-art studies show that FPGA-based hardware merge sorters (HMSs) can achieve superior performance compared with optimized algorithms on CPUs and GPUs. The performance of any HMS is proportional to its operating frequency (F) and the number of records that can be output each cycle (E). However, all existing HMSs have a problem that F drops significantly with increasing E due to the increase of the number of levels of gates. In this paper, we propose novel architectures for HMSs where the number of levels of gates is constant when E is increased. We implement some HMSs adopting the proposed architectures on a Virtex-7 FPGA. The evaluation shows that an HMS of E = 32 operates at 311MHz and achieves 3.13x higher throughput than the state-of-the-art HMS.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114998600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Bonded Force Computations on FPGAs 基于fpga的键合力计算
Qingqing Xiong, M. Herbordt
{"title":"Bonded Force Computations on FPGAs","authors":"Qingqing Xiong, M. Herbordt","doi":"10.1109/FCCM.2017.49","DOIUrl":"https://doi.org/10.1109/FCCM.2017.49","url":null,"abstract":"While acceleration of Molecular Dynamics has received much attention, a significant part of that application, the bonded force calculation, has not. We present what we believe to be the first description and analysis of bonded force calculations outside of ASICs. We characterize the computational requirements. We find that a naive direct implementation requires FPGA resources out of proportion with its proportion of the workload. We investigate other options including various softcores and speed/area tradeoffs. These result in an assortment of solutions optimal for various combinations of problem and cluster size.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115231388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
FPGA-Based Real-Time Charged Particle Trajectory Reconstruction at the Large Hadron Collider 基于fpga的大型强子对撞机带电粒子轨迹实时重建
E. Bartz, J. Chaves, Y. Gershtein, E. Halkiadakis, M. Hildreth, S. Kyriacou, K. Lannon, A. Lefeld, A. Ryd, L. Skinnari, R. Stone, C. Strohman, Z. Tao, B. Winer, P. Wittich, Zhiru Zhang, M. Zientek
{"title":"FPGA-Based Real-Time Charged Particle Trajectory Reconstruction at the Large Hadron Collider","authors":"E. Bartz, J. Chaves, Y. Gershtein, E. Halkiadakis, M. Hildreth, S. Kyriacou, K. Lannon, A. Lefeld, A. Ryd, L. Skinnari, R. Stone, C. Strohman, Z. Tao, B. Winer, P. Wittich, Zhiru Zhang, M. Zientek","doi":"10.1109/FCCM.2017.27","DOIUrl":"https://doi.org/10.1109/FCCM.2017.27","url":null,"abstract":"The upgrades of the Compact Muon Solenoid particle physics experiment at CERN's Large Hadron Collider provide a major challenge for the real-time collision data selection. This paper presents a novel approach to pattern recognition and charged particle trajectory reconstruction using an all-FPGA solution. The challenges include a large input data rate of about 20 to 40~Tbps, processing a new batch of input data every 25~ns, each consisting of about 10,000 precise position measurements of particles ('stubs'), perform the pattern recognition on these stubs to find the trajectories, and produce the list of parameters describing these trajectories within 4~us. A proposed solution to this problem is described, in particular, the implementation of the pattern recognition and particle trajectory determination using an all-FPGA system. The results of an end-to-end demonstrator system based on Xilinx Virtex-7 FPGAs that meets timing and performance requirements are presented.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114227380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信