Proceedings of the 2018 on Great Lakes Symposium on VLSI最新文献_第10页

ADDHard ADDHard

Proceedings of the 2018 on Great Lakes Symposium on VLSI Pub Date : 2018-05-30 DOI: 10.1145/3194554.3194647

Sai Manoj Pudukotai Dinakarrao, A. Jantsch

引用次数: 10

An Efficient Cache Management Scheme for Capacitor Equipped Solid State Drives 一种高效的电容固态硬盘缓存管理方案

Proceedings of the 2018 on Great Lakes Symposium on VLSI Pub Date : 2018-05-30 DOI: 10.1145/3194554.3194639

Congming Gao, Liang Shi, Yejia Di, Qiao Li, C. Xue, E. Sha

引用次数: 5

Dataflow-Based Mapping of Spiking Neural Networks on Neuromorphic Hardware 神经形态硬件上基于数据流的脉冲神经网络映射

Proceedings of the 2018 on Great Lakes Symposium on VLSI Pub Date : 2018-05-30 DOI: 10.1145/3194554.3194627

Anup Das, Akash Kumar

{"title":"Dataflow-Based Mapping of Spiking Neural Networks on Neuromorphic Hardware","authors":"Anup Das, Akash Kumar","doi":"10.1145/3194554.3194627","DOIUrl":"https://doi.org/10.1145/3194554.3194627","url":null,"abstract":"Spiking Neural Networks (SNNs) are powerful computation engines for pattern recognition and image classification applications. Apart from application performance such as recognition and classification accuracy, system performance such as throughput becomes important when executing these applications on a hardware. We propose a systematic design-flow to map SNN-based applications on a crossbar-based neuromorphic hardware, guaranteeing application as well as system performance. Synchronous Dataflow Graphs (SDFGs) are used to model these applications with extended semantics to represent neural network topologies. Self-timed scheduling is then used to analyze throughput, incorporating hardware constraints such as synaptic memory, communication and I/O bandwidth of crossbars. Our design-flow integrates CARLsim, a GPU-accelerated application-level SNN simulator with SDF3, a tool for mapping SDFG on hardware. We conducted experiments with realistic and synthetic SNNs on representative neuromorphic hardware, demonstrating throughput-resource trade-offs for a given application performance. For throughput-constrained applications, we show average 20% reduction of hardware usage with 19% reduction in energy consumption. For throughput-scalable applications, we show an average 53% higher throughput compared to a state-of-the-art approach.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121446312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Session details: Session 7: Machine Learning and HW Accelerators 会议详情:会议7:机器学习和硬件加速器

Proceedings of the 2018 on Great Lakes Symposium on VLSI Pub Date : 2018-05-30 DOI: 10.1145/3252913

Fatemeh Tehranipoor

引用次数: 0

Energy and Performance Efficient Computation Offloading for Deep Neural Networks in a Mobile Cloud Computing Environment 移动云计算环境下深度神经网络的能量和性能高效计算卸载

Proceedings of the 2018 on Great Lakes Symposium on VLSI Pub Date : 2018-05-30 DOI: 10.1145/3194554.3194565

Amir Erfan Eshratifar, Massoud Pedram

{"title":"Energy and Performance Efficient Computation Offloading for Deep Neural Networks in a Mobile Cloud Computing Environment","authors":"Amir Erfan Eshratifar, Massoud Pedram","doi":"10.1145/3194554.3194565","DOIUrl":"https://doi.org/10.1145/3194554.3194565","url":null,"abstract":"In today's computing technology scene, mobile devices are considered to be computationally weak, while large cloud servers are capable of handling expensive workloads, therefore, intensive computing tasks are typically offloaded to the cloud. Recent advances in learning techniques have enabled Deep Neural Networks (DNNs) to be deployed in a wide range of applications. Commercial speech based intelligent personal assistants (IPA) like Apple's Siri, which employs DNN as its recognition model, operate solely over the cloud. The cloud-only approach may require a large amount of data transfer between the cloud and the mobile device. The mobile-only approach may lack performance efficiency. In addition, the cloud server may be slow at times due to the congestion and limited subscription and mobile devices may have battery usage constraints. In this paper, we investigate the efficiency of offloading only some parts of the computations in DNNs to the cloud. We have formulated an optimal computation offloading framework for forward propagation in DNNs, which adapts to battery usage constraints on the mobile side and limited available resources on the cloud. Our simulation results show that our framework can achieve 1.42x on average and up to 3.07x speedup in the execution time on the mobile device. In addition, it results in 2.11x on average and up to 4.26x reduction in mobile energy consumption.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132813910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 77

Energy Consumption and Lifetime Improvement of Coarse-Grained Reconfigurable Architectures Targeting Low-Power Error-Tolerant Applications 面向低功耗容错应用的粗粒度可重构架构的能耗和寿命改进

Proceedings of the 2018 on Great Lakes Symposium on VLSI Pub Date : 2018-05-30 DOI: 10.1145/3194554.3194631

H. Afzali-Kusha, O. Akbari, M. Kamal, M. Pedram

{"title":"Energy Consumption and Lifetime Improvement of Coarse-Grained Reconfigurable Architectures Targeting Low-Power Error-Tolerant Applications","authors":"H. Afzali-Kusha, O. Akbari, M. Kamal, M. Pedram","doi":"10.1145/3194554.3194631","DOIUrl":"https://doi.org/10.1145/3194554.3194631","url":null,"abstract":"In this work, the application of a voltage over-scaling (VOS) technique for improving the lifetime and reliability of coarse-grained reconfigurable architectures (GCRAs) is presented. The proposed technique, which may be applied to CGRAs used as accelerators for low-power, error-tolerant applications, reduces the (strongly voltage-dependent) wearout effects and the energy consumption of processing elements (PEs) whenever the error impact on the output quality degradation can be tolerated. This provides us with the ability to lessen the wearout and reduce energy consumption of PEs when accuracy requirement for the results is rather low. Multiple degrees of computational accuracy can be achieved by using different overscaled voltage levels for the PEs. The efficacy of the proposed technique is studied by considering the bias temperature instability. The study is performed for two error-resilient applications. The CGRAs are implemented with 15nm FinFET operating at a nominal supply voltage of 0.8V. In addition, supply voltages of 0.75, 0.7, 0.65, and 0.6V are considered as overscaled voltage levels for this technology. Based on the quality constraint requirements of the benchmarks, optimum overscaled voltage levels for various PEs are determined and utilized. The approach may provide considerable lifetime and energy consumption improvements over those of the conventional exact and approximate computation approaches.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"1984 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114089276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

MC3A

Proceedings of the 2018 on Great Lakes Symposium on VLSI Pub Date : 2018-05-30 DOI: 10.1145/3194554.3194577

Lahir Marni, M. Hosseini, T. Mohsenin

{"title":"MC3A","authors":"Lahir Marni, M. Hosseini, T. Mohsenin","doi":"10.1145/3194554.3194577","DOIUrl":"https://doi.org/10.1145/3194554.3194577","url":null,"abstract":"The paper presents \"MC3A\"- Markov Chain Monte Carlo Many Core Accelerator, a high-throughput, domain-specific, programmable manycore accelerator, which effectively generates samples from a provided target distribution. MCMC samplers are used in machine learning, image and signal processing applications that are computationally intensive. In such scenarios, high-throughput samplers are of paramount importance. To achieve a high-throughput platform, we add two domain-specific instructions with dedicated hardware whose functions are extensively used in MCMC algorithms. These instructions bring down the number of clock cycles needed to implement the respective functions by 10x and 21x. A 64-cluster architecture of the MC3A is fully placed and routed in 65 nm, TSMC CMOS technology, where the VLSI layout of each cluster occupies an area of 0.577 mm^2 while consuming a power of 247 mW running at 1 GHz clock frequency. Our proposed MC3A achieves 6x higher throughput than its equivalent predecessor (PENC) and consumes 4x lower energy per sample. Also, when compared to other off-the-shelf platforms, such as Jetson TX1 and TX2 SoC, MC3A results in 195x and 191x higher throughput and consumes 808x and 726x lower energy per sample generation, respectively.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115834744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Scalable Hardware Accelerator for Mini-Batch Gradient Descent 用于小批量梯度下降的可扩展硬件加速器

Proceedings of the 2018 on Great Lakes Symposium on VLSI Pub Date : 2018-05-30 DOI: 10.1145/3194554.3194559

Sandeep Rasoori, V. Akella

引用次数: 2

FLexiTASK FLexiTASK

Proceedings of the 2018 on Great Lakes Symposium on VLSI Pub Date : 2018-05-30 DOI: 10.1145/3194554.3194644

Joel Mandebi Mbongue, Danielle Tchuinkou Kwadjo, C. Bobda

引用次数: 12

Low-Energy Architectures of Linear Classifiers for IoT Applications using Incremental Precision and Multi-Level Classification 使用增量精度和多级分类的物联网应用线性分类器的低能耗架构

Proceedings of the 2018 on Great Lakes Symposium on VLSI Pub Date : 2018-05-30 DOI: 10.1145/3194554.3194603

Sandhya Koteshwara, K. Parhi

引用次数: 3