2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2)最新文献

筛选
英文 中文
Deep Learning Inference on Embedded Devices: Fixed-Point vs Posit 嵌入式设备上的深度学习推理:定点vs定点
Seyed Hamed Fatemi Langroudi, Tej Pandit, D. Kudithipudi
{"title":"Deep Learning Inference on Embedded Devices: Fixed-Point vs Posit","authors":"Seyed Hamed Fatemi Langroudi, Tej Pandit, D. Kudithipudi","doi":"10.1109/EMC2.2018.00012","DOIUrl":"https://doi.org/10.1109/EMC2.2018.00012","url":null,"abstract":"Performing the inference step of deep learning in resource constrained environments, such as embedded devices, is challenging. Success requires optimization at both software and hardware levels. Low precision arithmetic and specifically low precision fixed-point number systems have become the standard for performing deep learning inference. However, representing non-uniform data and distributed parameters (e.g. weights) by using uniformly distributed fixed-point values is still a major drawback when using this number system. Recently, the posit number system was proposed, which represents numbers in a non-uniform manner. Therefore, in this paper we are motivated to explore using the posit number system to represent the weights of Deep Convolutional Neural Networks. However, we do not apply any quantization techniques and hence the network weights do not require re-training. The results of this exploration show that using the posit number system outperformed the fixed point number system in terms of accuracy and memory utilization.","PeriodicalId":377872,"journal":{"name":"2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130445945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Event Prediction in Processors Using Deep Temporal Models 基于深度时间模型的处理器事件预测
Tharindu Mathew, Aswin Raghavan, S. Chai
{"title":"Event Prediction in Processors Using Deep Temporal Models","authors":"Tharindu Mathew, Aswin Raghavan, S. Chai","doi":"10.1109/EMC2.2018.00014","DOIUrl":"https://doi.org/10.1109/EMC2.2018.00014","url":null,"abstract":"In order to achieve high processing efficiencies, next generation computer architecture designs need an effective Artificial Intelligence (AI)-framework to learn large-scale processor interactions. In this short paper, we present Deep Temporal Models (DTMs) that offer effective and scalable time-series representations to addresses key challenges for learning processor data: high data rate, cyclic patterns, and high dimensionality. We present our approach using DTMs to learn and predict processor events. We show comparisons using these learning models with promising initial simulation results.","PeriodicalId":377872,"journal":{"name":"2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122217170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A High Efficiency Accelerator for Deep Neural Networks 一种高效的深度神经网络加速器
Aliasger Zaidy, Andre Xian Ming Chang, Vinayak Gokhale, E. Culurciello
{"title":"A High Efficiency Accelerator for Deep Neural Networks","authors":"Aliasger Zaidy, Andre Xian Ming Chang, Vinayak Gokhale, E. Culurciello","doi":"10.1109/EMC2.2018.00010","DOIUrl":"https://doi.org/10.1109/EMC2.2018.00010","url":null,"abstract":"Deep Neural Networks (DNNs) are the current state of the art for various tasks such as object detection, natural language processing and semantic segmentation. These networks are massively parallel, hierarchical models with each level of hierarchy performing millions of operations on a single input. The enormous amount of parallel computation makes these DNNs suitable for custom acceleration. Custom accelerators can provide real time inference of DNNs at low power thus enabling widespread embedded deployment. In this paper, we present Snowflake, a high efficiency, low power accelerator for DNNs. Snowflake was designed to achieve optimum occupancy at low bandwidths and it is agnostic to the network architecture. Snowflake was implemented on the Xilinx Zynq XC7Z045 APSoC and achieves a peak performance of 128 G-ops/s. Snowflake is able to maintain a throughput of 98 FPS on AlexNet while averaging 1.2 GB/s of memory bandwidth.","PeriodicalId":377872,"journal":{"name":"2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2)","volume":"115 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120868645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Quantization-Friendly Separable Convolution for MobileNets 一种量化友好的移动网络可分离卷积
Tao Sheng, Chen Feng, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen, M. Aleksic
{"title":"A Quantization-Friendly Separable Convolution for MobileNets","authors":"Tao Sheng, Chen Feng, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen, M. Aleksic","doi":"10.1109/EMC2.2018.00011","DOIUrl":"https://doi.org/10.1109/EMC2.2018.00011","url":null,"abstract":"As deep learning (DL) is being rapidly pushed to edge computing, researchers invented various ways to make inference computation more efficient on mobile/IoT devices, such as network pruning, parameter compression, and etc. Quantization, as one of the key approaches, can effectively offload GPU, and make it possible to deploy DL on fixed-point pipeline. Unfortunately, not all existing networks design are friendly to quantization. For example, the popular lightweight MobileNetV1, while it successfully reduces parameter size and computation latency with separable convolution, our experiment shows its quantized models have large performance gap against its float point models. To resolve this, we analyzed the root cause of quantization loss and proposed a quantization-friendly separable convolution architecture. By evaluating the image classification task on ImageNet2012 dataset, our modified MobileNetV1 model can archive 8-bit inference top-1 accuracy in 68.03%, almost closed the gap to the float pipeline.","PeriodicalId":377872,"journal":{"name":"2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2)","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124694131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 101
A Case for Dynamic Activation Quantization in CNNs cnn中动态激活量化的一个例子
Karl Taht, Surya Narayanan, R. Balasubramonian
{"title":"A Case for Dynamic Activation Quantization in CNNs","authors":"Karl Taht, Surya Narayanan, R. Balasubramonian","doi":"10.1109/EMC2.2018.00009","DOIUrl":"https://doi.org/10.1109/EMC2.2018.00009","url":null,"abstract":"It is a well-established fact that CNNs are robust enough to tolerate low precision computations without any significant loss in accuracy. There have been works that exploit this fact, and try to allocate different precision for different layers (for both weights and activations), depending on the importance of a layer's precision in dictating the prediction accuracy. In all these works, the layer-wise precision of weights and activations is decided for a network by performing an offline design space exploration as well as retraining of weights. While these approaches show significant energy improvements, they make global decisions for precision requirements. In this project, we try to answer the question \"Can we vary the inter-and intra-layer bit-precision based on the region-wise importance of the individual input?\". The intuition behind this is that for a particular image, there might be regions that can be considered as background or unimportant for the network to make its final prediction. As these inputs propagate through the network, the regions of less importance in the same feature map can tolerate lower precision. Using metrics such as entropy, color gradient, and points of interest, we argue that a region of an image can be labeled important or unimportant, thus enabling lower precision for unimportant pixels. We show that per-input activation quantization can reduce computational energy up to 33.5% or 42.0% while maintaining original Top-1 and Top-5 accuracies respectively.","PeriodicalId":377872,"journal":{"name":"2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131624957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Invited Talk Abstract: Introducing ReQuEST: An Open Platform for Reproducible and Quality-Efficient Systems-ML Tournaments 摘要:介绍ReQuEST:一个可复制和高质量系统的开放平台- ml锦标赛
G. Fursin
{"title":"Invited Talk Abstract: Introducing ReQuEST: An Open Platform for Reproducible and Quality-Efficient Systems-ML Tournaments","authors":"G. Fursin","doi":"10.1109/emc2.2018.00008","DOIUrl":"https://doi.org/10.1109/emc2.2018.00008","url":null,"abstract":"Co-designing efficient machine learning based systems across the whole application/hardware/software stack to trade off speed, accuracy, energy and costs is becoming extremely complex and time consuming. Researchers often struggle to evaluate and compare different published works across rapidly evolving software frameworks, heterogeneous hardware platforms, compilers, libraries, algorithms, data sets, models, and environments. I will present our community effort to develop an open co-design tournament platform with an online public scoreboard based on Collective Knowledge workflow framework (CK). It gradually incorporates best research practices while providing a common way for multidisciplinary researchers to optimize and compare the quality vs. efficiency Pareto optimality of various workloads on diverse and complete hardware/software systems. All the winning solutions will be made available to the community as portable and customizable \"plug&play\" components with a common API to accelerate research and innovation! I will then discuss how our open competition and collaboration can help to achieve energy efficiency for cognitive workloads based on energy-efficient submissions from the 1st ReQuEST tournament co-located with ASPLOS'18. Further details: http://cKnowledge.org/request","PeriodicalId":377872,"journal":{"name":"2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116544641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Moving CNN Accelerator Computations Closer to Data 移动CNN加速器计算更接近数据
Sumanth Gudaparthi, Surya Narayanan, R. Balasubramonian
{"title":"Moving CNN Accelerator Computations Closer to Data","authors":"Sumanth Gudaparthi, Surya Narayanan, R. Balasubramonian","doi":"10.1109/EMC2.2018.00015","DOIUrl":"https://doi.org/10.1109/EMC2.2018.00015","url":null,"abstract":"A significant fraction of energy in recent CNN accelerators is dissipated in moving operands between storage and compute units. In this work, we re-purpose the CPU's last level cache to perform in-situ dot-product computations, thus significantly reducing data movement. Since a last level cache has several subarrays, many such dot-products can be performed in parallel, thus boosting throughput as well. The in-situ operation does not require analog circuits; it is performed with a bit-wise AND of two subarray rows, followed by digital aggregation of partial sums. The proposed architecture yields a 2.74× improvement in throughput and a 6.31× improvement in energy, relative to a DaDianNao baseline. This is primarily because the proposed architecture eliminates a large fraction of data transfers over H-Tree interconnects in the cache.","PeriodicalId":377872,"journal":{"name":"2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129291171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Keynote Abstract: Safety and Security at the Heart of Autonomous Driving 主题演讲摘要:安全与安保是自动驾驶的核心
K. Khouri
{"title":"Keynote Abstract: Safety and Security at the Heart of Autonomous Driving","authors":"K. Khouri","doi":"10.1109/EMC2.2018.00006","DOIUrl":"https://doi.org/10.1109/EMC2.2018.00006","url":null,"abstract":"The automotive industry is undergoing a revolution with connected, autonomous and electric vehicles and the benefits they can bring to the public. Drivers enjoying their daily commute, fewer road fatalities and less pollution are all possible thanks to new technologies. Car makers need to offer these features but at the same time make sure vehicles are safe and secure. In the coming years, there will be various levels of automation until we have fully autonomous vehicles. To achieve any level of automation, cars need to connect to other vehicles, connect to the infrastructure, sense the environment through various sensors such as camera and radar and then make maneuvering decisions based on all these inputs. Artificial intelligence is and will be deployed heavily to accomplish many of the tasks of autonomous driving. Perception and decision-making based on artificial intelligence introduces an entirely new set of challenges to car makers to ensure no security compromises as well as proving the decisions being made are functionally, behaviorally and environmentally safe. The challenge can be described in a simple question: \"If a machine learning based car system is accurate 99% of the time, are you willing to ride this car knowing that it will be wrong 1% of the time? What is the consequence of that incorrect decision?\" Deep expertise and research in the safety and security aspects of AI are needed to ensure future mass deployment and success in the area of autonomous driving.","PeriodicalId":377872,"journal":{"name":"2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126335145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Efficient Compiler Code Generation for Deep Learning Snowflake Co-Processor 深度学习雪花协处理器的高效编译代码生成
Andre Xian Ming Chang, Aliasger Zaidy, E. Culurciello
{"title":"Efficient Compiler Code Generation for Deep Learning Snowflake Co-Processor","authors":"Andre Xian Ming Chang, Aliasger Zaidy, E. Culurciello","doi":"10.1109/EMC2.2018.00013","DOIUrl":"https://doi.org/10.1109/EMC2.2018.00013","url":null,"abstract":"Deep Neural Networks (DNNs) are widely used in various applications including image classification, semantic segmentation and natural language processing. Various DNN models were developed to achieve high accuracy on different tasks. Efficiently mapping the workflow of those models onto custom accelerators requires a programmable hardware and a custom compiler. In this work, we use Snowflake, which is a programmable DNN targeted accelerator. We also present a compiler that correctly generated code for Snowflake. Our system were evaluated on various convolution layers present in AlexNet, ResNet and LightCNN. Snowflake with 256 processing units was implemented on Xilinx FPGA, and it achieved 70 frames/s for AlexNet without linear layers.","PeriodicalId":377872,"journal":{"name":"2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130426756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Invited Talk Abstract: Challenges and Solutions for Embedding Vision AI 摘要:嵌入视觉人工智能的挑战与解决方案
Charles Qi
{"title":"Invited Talk Abstract: Challenges and Solutions for Embedding Vision AI","authors":"Charles Qi","doi":"10.1109/EMC2.2018.00007","DOIUrl":"https://doi.org/10.1109/EMC2.2018.00007","url":null,"abstract":"Recently computer vision and neural network based AI technology have seen explosive demands in embedded systems such as robots, drones, autonomous vehicles, etc. Due to cost and power constraints, it remains quite challenging to achieve satisfactory performance, while maintaining power efficiency and scalability for embedded vision AI. This presentation first analyzes the technical challenges of embedding vision AI, from the perspectives of algorithm complexity, computation and memory BW demands, and constrains of power consumption profile. The analysis shows that modern neural networks for vision AI contain complex topology and diversified computation steps. These neural networks are often part of a large embedded vision processing pipeline, intermixed with conventional vision algorithms. As a result, the vision AI implementation demands several TOPS computation performance and ten's of GB memory BW. Subsequently the architecture of Tensilica Vision AI DSP processor technology is presented with three distinctive advantages: The optimized instruction sets of Vision P6 and Vision C5 DSP are explained as examples of achieving instruction level computation efficiency and performance. This is coupled with unique processor architecture features for achieving SoC level data processing efficiency and scalability that lead to a high-performance vision AI sub-system. The fully automated AI optimization framework, software libraries and tools provide practical performance tuning methodology and rapid turn-around time for embedded vision AI system design. In conclusion, the presentation offers considerations for future research and development to bring embedded vision AI to the next performance level.","PeriodicalId":377872,"journal":{"name":"2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131134768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信