2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)最新文献_第8页

Autopiler: An AI Based Framework for Program Autotuning and Options Recommendation Autopiler:一个基于AI的程序自动调整和选项推荐框架

2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2019-03-01 DOI: 10.1109/AICAS.2019.8771625

Kang-Lin Wang, Chi-Bang Kuan, Jiann-Fuh Liaw, Wei-Liang Kuo

{"title":"Autopiler: An AI Based Framework for Program Autotuning and Options Recommendation","authors":"Kang-Lin Wang, Chi-Bang Kuan, Jiann-Fuh Liaw, Wei-Liang Kuo","doi":"10.1109/AICAS.2019.8771625","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771625","url":null,"abstract":"Program autotuning has been proved to achieve great performance improvement in many compiler usage scenarios. Many autotuning frameworks have been provided to support fully-customizable configuration representations, a wide variety of representations for domain-specific tuning, and a user friendly interface for interaction between the program and the autotuner. However, tuning programs takes time, no matter it is autotuned or manually tuned. Oftentimes, programmers don’t have the time waiting for autotuners to finish and want to have rather good options to use instantly. This paper introduces Autopiler, a framework for building non-domain-specific program autotuners with machine learning based recommender systems for options prediction. This framework supports not only non-domain-specific tuning techniques, but also learns from previous tuning results and can make adequate good options recommendation before any tuning happens. We will illustrate the architecture of Autopiler and how to leverage recommender system for compiler options recommendation, in such way Autopiler can learn from the programs and becomes an AI boosted smart compiler. The experiment results show that Autopiler can deliver up to 19.46% performance improvement for in-house 4G LTE modem workloads.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"7 1-4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131492105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Intelligent Policy Selection for GPU Warp Scheduler GPU Warp Scheduler的智能策略选择

2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2019-03-01 DOI: 10.1109/AICAS.2019.8771596

L. Chiou, Tsung-Han Yang, Jian-Tang Syu, Che-Pin Chang, Yeong-Jar Chang

引用次数: 1

Improved Hybrid Memory Cube for Weight-Sharing Deep Convolutional Neural Networks 基于权重共享深度卷积神经网络的改进混合记忆立方体

2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2019-03-01 DOI: 10.1109/AICAS.2019.8771540

Hao Zhang, Jiongrui He, S. Ko

{"title":"Improved Hybrid Memory Cube for Weight-Sharing Deep Convolutional Neural Networks","authors":"Hao Zhang, Jiongrui He, S. Ko","doi":"10.1109/AICAS.2019.8771540","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771540","url":null,"abstract":"In recent years, many deep neural network accelerator architectures are proposed to improve the performance of processing deep neural network models. However, memory bandwidth is still the major issue and performance bottleneck of the deep neural network accelerators. The emerging 3D memory, such as hybrid memory cube (HMC) and processing-in-memory techniques provide new solutions to deep neural network implementation. In this paper, a novel HMC architecture is proposed for weight-sharing deep convolutional neural networks in order to solve the memory bandwidth bottleneck during the neural network implementation. The proposed HMC is designed based on conventional HMC architecture with only minor changes. In the logic layer, the vault controller is modified to enable parallel vault access. The weight parameters of pre-trained convolutional neural network are quantized to 16 numbers. During processing, the accumulation of the activations with shared weights is performed and only the accumulated results are transferred to the processing elements to perform multiplications with weights. By using this proposed architecture, the data transfer between main memory and processing elements can be reduced and the throughout of convolution operations can be improved by 30% compared to using HMC based multiply-accumulate design.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116755644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Outstanding Bit Error Tolerance of Resistive RAM-Based Binarized Neural Networks 电阻式ram二值化神经网络的容错性能

2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2019-03-01 DOI: 10.1109/AICAS.2019.8771544

T. Hirtzlin, M. Bocquet, Jacques-Olivier Klein, E. Nowak, E. Vianello, J. Portal, D. Querlioz

{"title":"Outstanding Bit Error Tolerance of Resistive RAM-Based Binarized Neural Networks","authors":"T. Hirtzlin, M. Bocquet, Jacques-Olivier Klein, E. Nowak, E. Vianello, J. Portal, D. Querlioz","doi":"10.1109/AICAS.2019.8771544","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771544","url":null,"abstract":"Resistive random access memories (RRAM) are novel nonvolatile memory technologies, which can be embedded at the core of CMOS, and which could be ideal for the in-memory implementation of deep neural networks. A particularly exciting vision is using them for implementing Binarized Neural Networks (BNNs), a class of deep neural networks with a highly reduced memory footprint. The challenge of resistive memory, however, is that they are prone to device variation, which can lead to bit errors. In this work we show that BNNs can tolerate these bit errors to an outstanding level, through simulations of networks on the MNIST and CIFAR10 tasks. If a standard BNN is used, up to 10−4 bit error rate can be tolerated with little impact on recognition performance on both MNIST and CIFAR10. We then show that by adapting the training procedure to the fact that the BNN will be operated on error-prone hardware, this tolerance can be extended to a bit error rate of 4 × 10−2. The requirements for RRAM are therefore a lot less stringent for BNNs than more traditional applications. We show, based on experimental measurements on a RRAM HfO2 technology, that this result can allow reduce RRAM programming energy by a factor 30.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121909140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Online Anomaly Detection in HPC Systems 高性能计算系统中的在线异常检测

2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2019-02-22 DOI: 10.1109/AICAS.2019.8771527

Andrea Borghesi, Antonio Libri, L. Benini, Andrea Bartolini

引用次数: 24

Extended Bit-Plane Compression for Convolutional Neural Network Accelerators 卷积神经网络加速器的扩展位平面压缩

2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2018-10-01 DOI: 10.1109/AICAS.2019.8771562

L. Cavigelli, L. Benini

引用次数: 18

Elastic Neural Networks for Classification 弹性神经网络分类

2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2018-10-01 DOI: 10.1109/AICAS.2019.8771475

Yi Zhou, Yue Bai, S. Bhattacharyya, H. Huttunen

引用次数: 12

Design of Intelligent EEG System for Human Emotion Recognition with Convolutional Neural Network 基于卷积神经网络的人类情绪识别智能脑电图系统设计

2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 1900-01-01 DOI: 10.1109/AICAS.2019.8771581

Kai-Yen Wang, Yun-Lung Ho, Yu-De Huang, W. Fang

引用次数: 20

An Energy-Efficient Accelerator with Relative- Indexing Memory for Sparse Compressed Convolutional Neural Network 稀疏压缩卷积神经网络中具有相对索引存储器的高效加速器

2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 1900-01-01 DOI: 10.1109/AICAS.2019.8771600

I-Chen Wu, Po-Tsang Huang, Chin-Yang Lo, W. Hwang

{"title":"An Energy-Efficient Accelerator with Relative- Indexing Memory for Sparse Compressed Convolutional Neural Network","authors":"I-Chen Wu, Po-Tsang Huang, Chin-Yang Lo, W. Hwang","doi":"10.1109/AICAS.2019.8771600","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771600","url":null,"abstract":"Deep convolutional neural networks (CNNs) are widely used in image recognition and feature classification. However, deep CNNs are hard to be fully deployed for edge devices due to both computation-intensive and memory-intensive workloads. The energy efficiency of CNNs is dominated by off-chip memory accesses and convolution computation. In this paper, an energy-efficient accelerator is proposed for sparse compressed CNNs by reducing DRAM accesses and eliminating zero-operand computation. Weight compression is utilized for sparse compressed CNNs to reduce the required memory capacity/bandwidth and a large portion of connections. Thus, ReLU function produces zero-valued activations. Additionally, the workloads are distributed based on channels to increase the degree of task parallelism, and all-row- to-all-row non-zero element multiplication is adopted for skipping redundant computation. The simulation results over the dense accelerator show that the proposed accelerator achieves 1.79x speedup and reduces 23.51%, 69.53%, 88.67% on-chip memory size, energy, and DRAM accesses of VGG-16.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127244267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Customized Convolutional Neural Network Design Using Improved Softmax Layer for Real-time Human Emotion Recognition 基于改进Softmax层的自定义卷积神经网络设计用于实时人类情绪识别

2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 1900-01-01 DOI: 10.1109/AICAS.2019.8771616

Kai-Yen Wang, Yu-De Huang, Yun-Lung Ho, W. Fang

引用次数: 8