2017 International Symposium on Rapid System Prototyping (RSP)最新文献

筛选
英文 中文
Constructing Fast and Cycle-Accurate Simulators for Configurable Accelerators Using C++ Templates 用c++模板构造可配置加速器的快速和周期精确模拟器
2017 International Symposium on Rapid System Prototyping (RSP) Pub Date : 2017-10-19 DOI: 10.1145/3130265.3130324
Michael Witterauf, Frank Hannig, J. Teich
{"title":"Constructing Fast and Cycle-Accurate Simulators for Configurable Accelerators Using C++ Templates","authors":"Michael Witterauf, Frank Hannig, J. Teich","doi":"10.1145/3130265.3130324","DOIUrl":"https://doi.org/10.1145/3130265.3130324","url":null,"abstract":"To quickly prototype accelerator/compiler co-designs, fast and highly accurate architectural simulators are indispensable. They must be fast to keep design iteration times low; they must be highly accurate to make simulation results meaningful. In this paper, we describe how to construct such fast, cycle-accurate simulators from an architectural model by using C++ templates. Not only are templates fully resolved at compile time, thus offering ample opportunity for optimization, they also aptly mirror synthesis-time parameterization of accelerators. For each hardware component, we encode these architecture parameters in a C++ type and construct a class templated on this type. Hierarchically composing the component classes then yields the overall simulator. To demonstrate our constructed simulators' speedup, we construct two simulators for a lightweight VLIW processor, one with, one without templates, and measured their performance: the templated simulator is about 4.85 times faster. Their execution speed makes our simulators well-suited for compiler validation and prototyping accelerator features.","PeriodicalId":157455,"journal":{"name":"2017 International Symposium on Rapid System Prototyping (RSP)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127042560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Binary Synthesis Implementing External Interrupt Handler as Independent Module 作为独立模块实现外部中断处理程序的二进制合成
2017 International Symposium on Rapid System Prototyping (RSP) Pub Date : 2017-10-19 DOI: 10.1145/3130265.3130317
Naoya Ito, Yuuki Oosako, N. Ishiura, H. Kanbara, H. Tomiyama
{"title":"Binary Synthesis Implementing External Interrupt Handler as Independent Module","authors":"Naoya Ito, Yuuki Oosako, N. Ishiura, H. Kanbara, H. Tomiyama","doi":"10.1145/3130265.3130317","DOIUrl":"https://doi.org/10.1145/3130265.3130317","url":null,"abstract":"This article presents a method of synthesizing hardware from a given executable binary code with an external interrupt handler, where the normal flow and the interrupt handling are executed by separate hardware modules. Our previous method synthesized the whole program into a single hardware module, in which register save/restore imposed limitations on the timing to start interrupt handling and also impaired efficiency of the synthesized hardware. By executing the two tasks on separate modules, register save/restore can be eliminated, which allows interrupt handler to start at arbitrary timing and reduces the response time and cost of the hardware. By allowing two processes to run in parallel, total execution time is also reduced. An experiment with a simple program has shown that the execution cycles and the delay were reduced by about 80% and 20%, respectively, as compared with MIPS CPU. A motor controller driven by periodical interrupts from a timer has been successfully synthesized from C and assembly programs, which runs more than 20 times faster than the MIPS CPU.","PeriodicalId":157455,"journal":{"name":"2017 International Symposium on Rapid System Prototyping (RSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130115053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
GeCo: Classification Restricted Boltzmann Machine Hardware for On-Chip Learning 用于片上学习的分类受限玻尔兹曼机器硬件
2017 International Symposium on Rapid System Prototyping (RSP) Pub Date : 2017-10-19 DOI: 10.1145/3130265.3138856
Wooseok Yi, Junki Park, Jae-Joon Kim
{"title":"GeCo: Classification Restricted Boltzmann Machine Hardware for On-Chip Learning","authors":"Wooseok Yi, Junki Park, Jae-Joon Kim","doi":"10.1145/3130265.3138856","DOIUrl":"https://doi.org/10.1145/3130265.3138856","url":null,"abstract":"We present a Classification Restricted Boltzmann Machine (Class-RBM) hardware for embedded machines with on-chip learning capability. The RBM is a kind of the generative model, and has been used as one of the most popular feature extractors and image preprocessors. The ClassRBM is a variant of the RBM that is adapted to classification tasks. We propose the multi-Neuron-Per-Class (multi-NPC) voting scheme for improving accuracy of ClassRBM. We also show that the Contrastive Divergence (CD), which is one of the most popular algorithms to train RBM, has limitations in multi-NPC ClassRBM learning and propose a modified CD algorithm to overcome the limitation. Experimental results on FPGA flatform for MNIST datasets confirm that classification accuracy of the proposed algorithm is ~ 2.12% higher than the conventional CD.","PeriodicalId":157455,"journal":{"name":"2017 International Symposium on Rapid System Prototyping (RSP)","volume":"201 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122830832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Speculative Execution in Distributed Controllers for High-Level Synthesis 面向高级综合的分布式控制器的推测执行
2017 International Symposium on Rapid System Prototyping (RSP) Pub Date : 2017-10-19 DOI: 10.1145/3130265.3130319
Miho Shimizu, N. Ishiura, Sayuri Ota, W. Nakano
{"title":"Speculative Execution in Distributed Controllers for High-Level Synthesis","authors":"Miho Shimizu, N. Ishiura, Sayuri Ota, W. Nakano","doi":"10.1145/3130265.3130319","DOIUrl":"https://doi.org/10.1145/3130265.3130319","url":null,"abstract":"This paper proposes a method of incorporating speculative execution into distributed control which enables efficient dynamic scheduling. In the presence of variable latency units, the static scheduling scheme in conventional high-level synthesis causes wasteful waits. Distributed control enables dynamic scheduling which adjust the execution timing of the operations dynamically. In this paper, we attempt to further enhance speed performance by introducing speculative execution based on branch prediction into distributed control. Experimental results on two examples showed that the execution cycles were reduced by 11.1 % to 21.9% when the prediction hit rate was 75%.","PeriodicalId":157455,"journal":{"name":"2017 International Symposium on Rapid System Prototyping (RSP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132764823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Prototyping Dynamic Task Migration on Heterogeneous Reconfigurable Systems 异构可重构系统的动态任务迁移原型
2017 International Symposium on Rapid System Prototyping (RSP) Pub Date : 2017-10-19 DOI: 10.1145/3130265.3130316
Arief Wicaksana, A. Bourge, O. Muller, A. Sasongko, F. Rousseau
{"title":"Prototyping Dynamic Task Migration on Heterogeneous Reconfigurable Systems","authors":"Arief Wicaksana, A. Bourge, O. Muller, A. Sasongko, F. Rousseau","doi":"10.1145/3130265.3130316","DOIUrl":"https://doi.org/10.1145/3130265.3130316","url":null,"abstract":"Reconfigurable devices, such as FPGAs, have been known to offer an excellent performance and a high efficiency in computation. Due to their improving capacity and more efficient architecture recently, there are growing interests in using FPGAs as coprocessors in reconfigurable systems. However, FPGAs still lack the support in dynamic scheduling, e.g. to manage multiple tasks or users in a system. Performing runtime task relocation or load distribution is not possible unless the reconfigurable system supports dynamic task migration. Such ability requires the automation of configuration and context management in reconfigurable architecture, which is not available in the existing solutions. In this paper, we propose a framework for prototyping dynamic task migration between heterogeneous FPGAs. A task running on one FPGA can be suspended and resumed on another FPGA with different architecture. The extraction and restoration of FPGA registers and memory values are possible due to the task-specific extraction mechanism provided by the tasks. The proposed framework exploits a high-performance embedded processor tightly-coupled to an FPGA to automatically manage the configuration and context. It utilizes two popular heterogeneous reconfigurable systems in the implementation, Xilinx Zynq ZC706 and Altera Arria V SoC. Tests are performed using graphical and non-graphical benchmark applications and performance results are presented.","PeriodicalId":157455,"journal":{"name":"2017 International Symposium on Rapid System Prototyping (RSP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123988900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信