Proceedings of the 2018 on Great Lakes Symposium on VLSI最新文献

筛选
英文 中文
ADDHard ADDHard
Proceedings of the 2018 on Great Lakes Symposium on VLSI Pub Date : 2018-05-30 DOI: 10.1145/3194554.3194647
Sai Manoj Pudukotai Dinakarrao, A. Jantsch
{"title":"ADDHard","authors":"Sai Manoj Pudukotai Dinakarrao, A. Jantsch","doi":"10.1145/3194554.3194647","DOIUrl":"https://doi.org/10.1145/3194554.3194647","url":null,"abstract":"Anomaly detection in Electrocardiogram (ECG) signals facilitates the diagnosis of cardiovascular diseases i.e., arrhythmias. Existing methods, although fairly accurate, demand a large number of computational resources. Based on the pre-processing of ECG signal, we present a low-complex digital hardware implementation (ADDHard) for arrhythmia detection. ADDHard has the advantages of low-power consumption and a small foot print. ADDHard is suitable especially for resource constrained systems such as body wearable devices. Its implementation was tested with the MIT-BIH arrhythmia database and achieved an accuracy of 97.28% with a specificity of 98.25% on average.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"258 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123074987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
An Efficient Cache Management Scheme for Capacitor Equipped Solid State Drives 一种高效的电容固态硬盘缓存管理方案
Proceedings of the 2018 on Great Lakes Symposium on VLSI Pub Date : 2018-05-30 DOI: 10.1145/3194554.3194639
Congming Gao, Liang Shi, Yejia Di, Qiao Li, C. Xue, E. Sha
{"title":"An Efficient Cache Management Scheme for Capacitor Equipped Solid State Drives","authors":"Congming Gao, Liang Shi, Yejia Di, Qiao Li, C. Xue, E. Sha","doi":"10.1145/3194554.3194639","DOIUrl":"https://doi.org/10.1145/3194554.3194639","url":null,"abstract":"Within SSDs, random access memory (RAM) has been adopted as cache inside controller for achieving better performance. However, due to the volatility characteristic of RAM, data loss may happen when sudden power interrupts. To solve this issue, capacitor has been equipped inside emerging SSDs as interim supplier. However, the aging issue of capacitor will result in capacitance decreases over time. Once the remaining capacitance is not able to write all dirty pages in the cache back to flash memory, data loss may happen. In order to solve the above issue, an efficient cache management scheme for capacitor equipped SSDs is proposed in this work. The basic idea of the scheme is to bound the number of dirty pages in cache within the capability of the capacitor. Simulation results show that the proposed scheme achieves encourage improvement on lifetime and performance while power interruption induced data loss is avoided.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126848587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Dataflow-Based Mapping of Spiking Neural Networks on Neuromorphic Hardware 神经形态硬件上基于数据流的脉冲神经网络映射
Proceedings of the 2018 on Great Lakes Symposium on VLSI Pub Date : 2018-05-30 DOI: 10.1145/3194554.3194627
Anup Das, Akash Kumar
{"title":"Dataflow-Based Mapping of Spiking Neural Networks on Neuromorphic Hardware","authors":"Anup Das, Akash Kumar","doi":"10.1145/3194554.3194627","DOIUrl":"https://doi.org/10.1145/3194554.3194627","url":null,"abstract":"Spiking Neural Networks (SNNs) are powerful computation engines for pattern recognition and image classification applications. Apart from application performance such as recognition and classification accuracy, system performance such as throughput becomes important when executing these applications on a hardware. We propose a systematic design-flow to map SNN-based applications on a crossbar-based neuromorphic hardware, guaranteeing application as well as system performance. Synchronous Dataflow Graphs (SDFGs) are used to model these applications with extended semantics to represent neural network topologies. Self-timed scheduling is then used to analyze throughput, incorporating hardware constraints such as synaptic memory, communication and I/O bandwidth of crossbars. Our design-flow integrates CARLsim, a GPU-accelerated application-level SNN simulator with SDF3, a tool for mapping SDFG on hardware. We conducted experiments with realistic and synthetic SNNs on representative neuromorphic hardware, demonstrating throughput-resource trade-offs for a given application performance. For throughput-constrained applications, we show average 20% reduction of hardware usage with 19% reduction in energy consumption. For throughput-scalable applications, we show an average 53% higher throughput compared to a state-of-the-art approach.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121446312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Session details: Session 7: Machine Learning and HW Accelerators 会议详情:会议7:机器学习和硬件加速器
Proceedings of the 2018 on Great Lakes Symposium on VLSI Pub Date : 2018-05-30 DOI: 10.1145/3252913
Fatemeh Tehranipoor
{"title":"Session details: Session 7: Machine Learning and HW Accelerators","authors":"Fatemeh Tehranipoor","doi":"10.1145/3252913","DOIUrl":"https://doi.org/10.1145/3252913","url":null,"abstract":"","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130250538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Energy and Performance Efficient Computation Offloading for Deep Neural Networks in a Mobile Cloud Computing Environment 移动云计算环境下深度神经网络的能量和性能高效计算卸载
Proceedings of the 2018 on Great Lakes Symposium on VLSI Pub Date : 2018-05-30 DOI: 10.1145/3194554.3194565
Amir Erfan Eshratifar, Massoud Pedram
{"title":"Energy and Performance Efficient Computation Offloading for Deep Neural Networks in a Mobile Cloud Computing Environment","authors":"Amir Erfan Eshratifar, Massoud Pedram","doi":"10.1145/3194554.3194565","DOIUrl":"https://doi.org/10.1145/3194554.3194565","url":null,"abstract":"In today's computing technology scene, mobile devices are considered to be computationally weak, while large cloud servers are capable of handling expensive workloads, therefore, intensive computing tasks are typically offloaded to the cloud. Recent advances in learning techniques have enabled Deep Neural Networks (DNNs) to be deployed in a wide range of applications. Commercial speech based intelligent personal assistants (IPA) like Apple's Siri, which employs DNN as its recognition model, operate solely over the cloud. The cloud-only approach may require a large amount of data transfer between the cloud and the mobile device. The mobile-only approach may lack performance efficiency. In addition, the cloud server may be slow at times due to the congestion and limited subscription and mobile devices may have battery usage constraints. In this paper, we investigate the efficiency of offloading only some parts of the computations in DNNs to the cloud. We have formulated an optimal computation offloading framework for forward propagation in DNNs, which adapts to battery usage constraints on the mobile side and limited available resources on the cloud. Our simulation results show that our framework can achieve 1.42x on average and up to 3.07x speedup in the execution time on the mobile device. In addition, it results in 2.11x on average and up to 4.26x reduction in mobile energy consumption.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132813910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
Energy Consumption and Lifetime Improvement of Coarse-Grained Reconfigurable Architectures Targeting Low-Power Error-Tolerant Applications 面向低功耗容错应用的粗粒度可重构架构的能耗和寿命改进
Proceedings of the 2018 on Great Lakes Symposium on VLSI Pub Date : 2018-05-30 DOI: 10.1145/3194554.3194631
H. Afzali-Kusha, O. Akbari, M. Kamal, M. Pedram
{"title":"Energy Consumption and Lifetime Improvement of Coarse-Grained Reconfigurable Architectures Targeting Low-Power Error-Tolerant Applications","authors":"H. Afzali-Kusha, O. Akbari, M. Kamal, M. Pedram","doi":"10.1145/3194554.3194631","DOIUrl":"https://doi.org/10.1145/3194554.3194631","url":null,"abstract":"In this work, the application of a voltage over-scaling (VOS) technique for improving the lifetime and reliability of coarse-grained reconfigurable architectures (GCRAs) is presented. The proposed technique, which may be applied to CGRAs used as accelerators for low-power, error-tolerant applications, reduces the (strongly voltage-dependent) wearout effects and the energy consumption of processing elements (PEs) whenever the error impact on the output quality degradation can be tolerated. This provides us with the ability to lessen the wearout and reduce energy consumption of PEs when accuracy requirement for the results is rather low. Multiple degrees of computational accuracy can be achieved by using different overscaled voltage levels for the PEs. The efficacy of the proposed technique is studied by considering the bias temperature instability. The study is performed for two error-resilient applications. The CGRAs are implemented with 15nm FinFET operating at a nominal supply voltage of 0.8V. In addition, supply voltages of 0.75, 0.7, 0.65, and 0.6V are considered as overscaled voltage levels for this technology. Based on the quality constraint requirements of the benchmarks, optimum overscaled voltage levels for various PEs are determined and utilized. The approach may provide considerable lifetime and energy consumption improvements over those of the conventional exact and approximate computation approaches.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"1984 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114089276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
MC3A
Proceedings of the 2018 on Great Lakes Symposium on VLSI Pub Date : 2018-05-30 DOI: 10.1145/3194554.3194577
Lahir Marni, M. Hosseini, T. Mohsenin
{"title":"MC3A","authors":"Lahir Marni, M. Hosseini, T. Mohsenin","doi":"10.1145/3194554.3194577","DOIUrl":"https://doi.org/10.1145/3194554.3194577","url":null,"abstract":"The paper presents \"MC3A\"- Markov Chain Monte Carlo Many Core Accelerator, a high-throughput, domain-specific, programmable manycore accelerator, which effectively generates samples from a provided target distribution. MCMC samplers are used in machine learning, image and signal processing applications that are computationally intensive. In such scenarios, high-throughput samplers are of paramount importance. To achieve a high-throughput platform, we add two domain-specific instructions with dedicated hardware whose functions are extensively used in MCMC algorithms. These instructions bring down the number of clock cycles needed to implement the respective functions by 10x and 21x. A 64-cluster architecture of the MC3A is fully placed and routed in 65 nm, TSMC CMOS technology, where the VLSI layout of each cluster occupies an area of 0.577 mm^2 while consuming a power of 247 mW running at 1 GHz clock frequency. Our proposed MC3A achieves 6x higher throughput than its equivalent predecessor (PENC) and consumes 4x lower energy per sample. Also, when compared to other off-the-shelf platforms, such as Jetson TX1 and TX2 SoC, MC3A results in 195x and 191x higher throughput and consumes 808x and 726x lower energy per sample generation, respectively.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115834744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Scalable Hardware Accelerator for Mini-Batch Gradient Descent 用于小批量梯度下降的可扩展硬件加速器
Proceedings of the 2018 on Great Lakes Symposium on VLSI Pub Date : 2018-05-30 DOI: 10.1145/3194554.3194559
Sandeep Rasoori, V. Akella
{"title":"Scalable Hardware Accelerator for Mini-Batch Gradient Descent","authors":"Sandeep Rasoori, V. Akella","doi":"10.1145/3194554.3194559","DOIUrl":"https://doi.org/10.1145/3194554.3194559","url":null,"abstract":"Iterative first-order methods that use gradient information form the core computation kernels of modern statistical data analytic engines, such as MADLib, Impala, Google Brain, GraphLab, MLlib in Spark, among others. Even the most advanced parallel stochastic gradient descent algorithm, such as Hogwild is not very scalable on conventional chip multiprocessors because of the bottlenecks induced by the memory system when sharing large model vectors. We propose a scalable architecture for large scale parallel gradient descent on a Field Programmable Gate Array (FPGA) by taking advantage of the large amount of embedded memory in modern FPGAs. We propose a novel data layout mechanism that eliminates the need for expensive synchronization and locking of shared data, which makes the architecture scalable. A 32-PE system on the Stratix V FPGA shows about 5x improvement in performance compared to state-of-the-art implementation on a 14 core/28 thread Intel Xeon CPU with 64 GB memory and operating at 2.6 GHz.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"2015 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121006685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
FLexiTASK FLexiTASK
Proceedings of the 2018 on Great Lakes Symposium on VLSI Pub Date : 2018-05-30 DOI: 10.1145/3194554.3194644
Joel Mandebi Mbongue, Danielle Tchuinkou Kwadjo, C. Bobda
{"title":"FLexiTASK","authors":"Joel Mandebi Mbongue, Danielle Tchuinkou Kwadjo, C. Bobda","doi":"10.1145/3194554.3194644","DOIUrl":"https://doi.org/10.1145/3194554.3194644","url":null,"abstract":"One of the major obstacles to the adoption of FPGAs in high-performance computing is their programmability. It requires hardware design skills and long compilation times. Overlays have been proposed as a way to abstract FPGA resources. Unfortunately, most of the time, the topologies they use to connect computing cores impose restrictions on where tasks are placed and how they communicate. In this paper, we propose an overlay architecture designed for efficiency and flexibility. It features a novel Network-on-Chip (NoC) infrastructure making flexible, with no limitation, the placement of hardware tasks. The presented architecture allows tasks to communicate with a low latency and eases the reconfiguration of desired areas on the fabric at runtime. After prototyping the proposed architecture on an Altera Cyclone V FPGA, a maximum frequency of 282 MHz has been reached and a speedup ranging from 4x to 195x has been observed in some applications compared to the native execution.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121303501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Low-Energy Architectures of Linear Classifiers for IoT Applications using Incremental Precision and Multi-Level Classification 使用增量精度和多级分类的物联网应用线性分类器的低能耗架构
Proceedings of the 2018 on Great Lakes Symposium on VLSI Pub Date : 2018-05-30 DOI: 10.1145/3194554.3194603
Sandhya Koteshwara, K. Parhi
{"title":"Low-Energy Architectures of Linear Classifiers for IoT Applications using Incremental Precision and Multi-Level Classification","authors":"Sandhya Koteshwara, K. Parhi","doi":"10.1145/3194554.3194603","DOIUrl":"https://doi.org/10.1145/3194554.3194603","url":null,"abstract":"This paper presents a novel incremental-precision classification approach that leads to a reduction in energy consumption of linear classifiers for IoT applications. Features are first input to a low-precision classifier. If the classifier successfully classifies the sample, then the process terminates. Otherwise, the classification performance is incrementally improved by using a classifier of higher precision. This process is repeated until the classification is complete. The argument is that many samples can be classified using the low-precision classifier, leading to a reduction in energy. To achieve incremental-precision, a novel data-path decomposition is proposed to design of fixed-width adders and multipliers. These components improve the precision without recalculating the outputs, thus reducing energy. Using a linear classification example, it is shown that the proposed incremental-precision based multi-level classifier approach can reduce energy by about 41% while achieving comparable accuracies as that of a full-precision system.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"975 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116214655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信