Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion最新文献

筛选
英文 中文
Prediction based convolution neural network acceleration: work-in-progress 基于卷积神经网络加速预测的研究进展
Y. Yao, Zhonghai Lu
{"title":"Prediction based convolution neural network acceleration: work-in-progress","authors":"Y. Yao, Zhonghai Lu","doi":"10.1145/3125501.3125523","DOIUrl":"https://doi.org/10.1145/3125501.3125523","url":null,"abstract":"Although intra-layer parallelism is commonly used to expedite CNN execution, it is difficult to achieve inter-layer parallelism because of data dependence between layers. In the paper, we propose a two-phase prediction and correction mechanism to break the data dependence between CNN layers so as to enable inter-layer parallelism. Our technique achieves one more order of magnitude (from the order of 10 to the order of 100) CNN acceleration compared to other three state-of-the-art GPU based CNN acceleration mechanisms.","PeriodicalId":259093,"journal":{"name":"Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122321314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A "high resilience" mode to minimize soft error vulnerabilities in ARM cortex-R CPU pipelines: work-in-progress 一个“高弹性”模式,以尽量减少ARM cortex-R CPU管道中的软错误漏洞:正在进行中
X. Iturbe, Balaji Venu, John Penton, Emre Ozer
{"title":"A \"high resilience\" mode to minimize soft error vulnerabilities in ARM cortex-R CPU pipelines: work-in-progress","authors":"X. Iturbe, Balaji Venu, John Penton, Emre Ozer","doi":"10.1145/3125501.3125509","DOIUrl":"https://doi.org/10.1145/3125501.3125509","url":null,"abstract":"This paper proposes a \"high resilience\" execution mode to increase the robustness of CPU pipelines to soft errors when executing critical software routines. The proposed execution mode reduces the error rate by approximately 11% in an ARM Cortex-R5 CPU, and requires only a few minor modifications to be made in its microarchitecture. These modifications do not impact the characteristic area, power consumption and performance features of the original CPU.","PeriodicalId":259093,"journal":{"name":"Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132511799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Enabling reliable main memory using STT-MRAM via restore-aware memory management: work-in-progress 通过恢复感知内存管理,使用STT-MRAM启用可靠的主存:正在进行的工作
Armin Haj Aboutalebi, Lide Duan
{"title":"Enabling reliable main memory using STT-MRAM via restore-aware memory management: work-in-progress","authors":"Armin Haj Aboutalebi, Lide Duan","doi":"10.1145/3125501.3125517","DOIUrl":"https://doi.org/10.1145/3125501.3125517","url":null,"abstract":"As an important non-volatile memory technology, STT-MRAM is widely considered as a universal memory solution in current processors. Employing STT-MRAM as the main memory offers a wide variety of benefits, but also results in unique design challenges. In particular, read disturbance characterizes accidental data corruption in STT-MRAM after it is read, leading to a need of restoring data back to memory after each read operation. These extra restores significantly degrade system performance and energy efficiency, greatly changing the timing scenarios that conventional designs were optimized for. As a result, directly adopting conventional, restore-agnostic memory management techniques may lead to sub-optimal designs for STT-MRAM. In this work, we propose Restore-Aware Policy Selection (RAPS), a dynamic and hybrid row buffer management scheme that factors in the inevitable data restores in STT-MRAM-based main memory. RAPS monitors the row buffer hit rate at run time, dynamically switching between the open- and close-page policies. By factoring in restores, RAPS accurately captures the optimal design points, achieving optimal policy selections at run time. Our experimental results show that RAPS significantly improves system performance and energy efficiency compared to the conventional policies.","PeriodicalId":259093,"journal":{"name":"Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115213782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improving NVMe SSD I/O determinism with PCIe virtual channel: work-in-progress 使用PCIe虚拟通道改进NVMe SSD I/O确定性:正在进行中
Seonbong Kim, Joon-Sung Yang
{"title":"Improving NVMe SSD I/O determinism with PCIe virtual channel: work-in-progress","authors":"Seonbong Kim, Joon-Sung Yang","doi":"10.1145/3125501.3125520","DOIUrl":"https://doi.org/10.1145/3125501.3125520","url":null,"abstract":"NVMe SSD over PCIe is attractive since it provides high throughput and low latency. However, complex internal SSD operations may cause a non-deterministic I/O latency which is one of the most important factors in a storage system. While conventional approaches to enhance I/O latency prediction are based on host systems, this paper proposes a novel SSD-based deterministic latency enhancement scheme. The proposed method exploits the fact that multiple virtual channels can be utilized. For each virtual channel, the proposed method assigns a different priority for data transmission. NVMe SSD analyses its internal latency and dynamically chooses the virtual channels to compensate the latency. The experimental results show that, using a PCIe switch model, the proposed method can save 41.6% of the latency for each transaction layer packet.","PeriodicalId":259093,"journal":{"name":"Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128362237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-grained performance estimation for MPSoC compilers: work-in-progress MPSoC编译器的多粒度性能估计:正在进行的工作
M. Aguilar, Abhishek Aggarwal, Awaid Shaheen, R. Leupers, G. Ascheid, J. Castrillón, L. Fitzpatrick
{"title":"Multi-grained performance estimation for MPSoC compilers: work-in-progress","authors":"M. Aguilar, Abhishek Aggarwal, Awaid Shaheen, R. Leupers, G. Ascheid, J. Castrillón, L. Fitzpatrick","doi":"10.1145/3125501.3125521","DOIUrl":"https://doi.org/10.1145/3125501.3125521","url":null,"abstract":"Parallelizing compilers are a promising solution to tackle key challenges of MPSoC programming. One fundamental aspect for a profitable parallelization is to estimate the performance of the applications on the target platforms. There is a wide range of state-of-the-art performance estimation techniques, such as, simulation-based, measurement-based, among others. They provide performance estimates typically only at function or basic block granularity. However, MPSoC compilers require performance information at other granularities, such as statement, loop or even arbitrary code blocks. In this paper, we propose a framework to adapt performance information sources to any granularity required by an MPSoC compiler.","PeriodicalId":259093,"journal":{"name":"Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116984913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A high-performance FPGA accelerator for sparse neural networks: work-in-progress 用于稀疏神经网络的高性能FPGA加速器:正在开发中
Yuntao Lu, Lei Gong, Chongchong Xu, Fan Sun, Yiwei Zhang, Chao Wang, Xuehai Zhou
{"title":"A high-performance FPGA accelerator for sparse neural networks: work-in-progress","authors":"Yuntao Lu, Lei Gong, Chongchong Xu, Fan Sun, Yiwei Zhang, Chao Wang, Xuehai Zhou","doi":"10.1145/3125501.3125510","DOIUrl":"https://doi.org/10.1145/3125501.3125510","url":null,"abstract":"Neural networks have been widely used in a large range of domains, researchers tune numbers of layrs, neurons and synapses to adapt various applications. As a consequence, computations and memory of neural networks models are both intensive. As large requirements of memory and computing resources, it is difficult to deploy neural networks on resource-limited platforms. Sparse neural networks, which prune redundant neurons and synapses, alleviate computation and memory pressure. However, conventional accelerators cannot benefit from the sparse feature. In this paper, we propose a high-performance FPGA accelerator for sparse neural networks which utilizes eliminate computations and storage space. This work compresses sparse weights and processes compressed data directly. Experimental results demonstrate that our accelerator will reduce 50% and 10% storage of convolutional and full-connected layers, and achieve 3x speedup of performance over an optimized conventional FPGA accelerator.","PeriodicalId":259093,"journal":{"name":"Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126513774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
SSS: self-aware system-on-chip using static-dynamic hybrid method (work-in-progress) SSS:采用静动态混合方法的自感知片上系统(在研)
Gaoming Du, Shibi Ma, Zhenmin Li, Zhonghai Lu, Yiming Ouyang, M. Gao
{"title":"SSS: self-aware system-on-chip using static-dynamic hybrid method (work-in-progress)","authors":"Gaoming Du, Shibi Ma, Zhenmin Li, Zhonghai Lu, Yiming Ouyang, M. Gao","doi":"10.1145/3125501.3125527","DOIUrl":"https://doi.org/10.1145/3125501.3125527","url":null,"abstract":"Network on chip has become the de facto communication standard for multi-core or many-core system on chip, due to its scalability and flexibility. However, temperature is an important factor in NoC design, which affects the overall performance of SoC---decreasing circuit frequency, increasing energy consumption, and even shortening chip lifetime. In this paper, we propose SSS, a self-aware SoC using a static-dynamic hybrid method, which combines dynamic mapping and static mapping to reduce the hot-spots temperature for NoC based SoCs. First, we propose monitoring the thermal distribution for self-state sensoring. Then, in static mapping stage, we calculate the optimal mapping solutions under different temperature modes using discrete firefly algorithm to help self-decision making. Finally, in dynamic mapping stage, we achieve dynamic mapping through configuring NoC and SoC sentient unit for self-optimizing. Experimental results show SSS can reduce the peak temperature by up to 30.64%. FPGA prototype shows the effectiveness and smartness of SSS in reducing hot-spots temperature.","PeriodicalId":259093,"journal":{"name":"Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126179521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient pulsed-latch implementation for multiport register files: work-in-progress 多端口寄存器文件的高效脉冲锁存器实现:正在进行的工作
W. Elsharkasy, Hasan Erdem Yantır, A. Djahromi, A. Eltawil, F. Kurdahi
{"title":"Efficient pulsed-latch implementation for multiport register files: work-in-progress","authors":"W. Elsharkasy, Hasan Erdem Yantır, A. Djahromi, A. Eltawil, F. Kurdahi","doi":"10.1145/3125501.3125515","DOIUrl":"https://doi.org/10.1145/3125501.3125515","url":null,"abstract":"In this paper, register file design using pulsed latches is presented. Having some advantages in performance, area and power, pulsed latches represent an attractive implementation of register files. In addition, a proposed multiport register file architecture is introduced using single physical read/write ports to virtualize additional ports for read and write. The initial results show huge savings in area and power in comparison to the traditional architectures.","PeriodicalId":259093,"journal":{"name":"Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126583626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Enabling NVM-based deep learning acceleration using nonuniform data quantization: work-in-progress
Hao Yan, Ethan C. Ahn, Lide Duan
{"title":"Enabling NVM-based deep learning acceleration using nonuniform data quantization: work-in-progress","authors":"Hao Yan, Ethan C. Ahn, Lide Duan","doi":"10.1145/3125501.3125516","DOIUrl":"https://doi.org/10.1145/3125501.3125516","url":null,"abstract":"Apart from employing a co-processor (e.g., GPU) for neural network (NN) computation, utilizing the unique characteristics of nonvolatile memories (NVM), including RRAM, phase change memory (PCM), and STT-MRAM, to accelerate NN algorithms has been extensively studied. In such approaches, input data and synaptic weights are represented using word line voltages and cell resistance, with the resulting bit line current indicating the calculation result. However, the limited number of resistance levels in a NVM cell largely reduces the algorithm data precision, thus significantly lowering the model inference accuracy. Motivated by the observation that the conventional, uniformly generated data quantization points are not equally important to the model, we propose a nonuniform data quantization scheme to better represent the model in NVM cells and minimize the inference accuracy loss. Our experimental results show that the proposed scheme can achieve highly accurate deep learning model inference using as low as only 4 bits for synaptic weight representation. This effectively enables a NVM with few cell resistance levels (e.g., STT-MRAM) to perform NN calculation, and also results in additional benefits in performance, energy, and memory storage.","PeriodicalId":259093,"journal":{"name":"Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123691857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
REDEFINE®™: a case for WCET-friendly hardware accelerators for real time applications (work-in-progress) REDEFINE®™:用于实时应用的wcet友好硬件加速器案例(正在开发中)
K. Madhu, Tarun Singla, S. Nandy, R. Narayan, Francois Neumann, P. Baufreton
{"title":"REDEFINE®™: a case for WCET-friendly hardware accelerators for real time applications (work-in-progress)","authors":"K. Madhu, Tarun Singla, S. Nandy, R. Narayan, Francois Neumann, P. Baufreton","doi":"10.1145/3125501.3125526","DOIUrl":"https://doi.org/10.1145/3125501.3125526","url":null,"abstract":"REDEFINE is a distributed dynamic dataflow architecture, designed for exploiting parallelism at various granularities as an embedded system-on-chip (SoC). This paper dwells on the flexibility of REDEFINE architecture and its execution model in accelerating real-time applications coupled with a WCET analyzer that computes execution time bounds of real time applications.","PeriodicalId":259093,"journal":{"name":"Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124784872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信