资源受限设备的自适应精确训练

2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS) Pub Date : 2020-11-01 DOI:10.1109/ICDCS47774.2020.00185

Tian Huang, Tao Luo, Joey Tianyi Zhou

{"title":"资源受限设备的自适应精确训练","authors":"Tian Huang, Tao Luo, Joey Tianyi Zhou","doi":"10.1109/ICDCS47774.2020.00185","DOIUrl":null,"url":null,"abstract":"Learn in-situ is a growing trend for Edge AI. Training deep neural network (DNN) on edge devices is challenging because both energy and memory are constrained. Low precision training helps to reduce the energy cost of a single training iteration, but that does not necessarily translate to energy savings for the whole training process, because low precision could slows down the convergence rate. One evidence is that most works for low precision training keep an fp32 copy of the model during training, which in turn imposes memory requirements on edge devices. In this work we propose Adaptive Precision Training. It is able to save both total training energy cost and memory usage at the same time. We use model of the same precision for both forward and backward pass in order to reduce memory usage for training. Through evaluating the progress of training, APT allocates layer-wise precision dynamically so that the model learns quicker for longer time. APT provides an application specific hyper-parameter for users to play trade-off between training energy cost, memory usage and accuracy. Experiment shows that APT achieves more than 50% saving on training energy and memory usage with limited accuracy loss. 20% more savings of training energy and memory usage can be achieved in return for a 1% sacrifice in accuracy loss.","PeriodicalId":158630,"journal":{"name":"2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Adaptive Precision Training for Resource Constrained Devices\",\"authors\":\"Tian Huang, Tao Luo, Joey Tianyi Zhou\",\"doi\":\"10.1109/ICDCS47774.2020.00185\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Learn in-situ is a growing trend for Edge AI. Training deep neural network (DNN) on edge devices is challenging because both energy and memory are constrained. Low precision training helps to reduce the energy cost of a single training iteration, but that does not necessarily translate to energy savings for the whole training process, because low precision could slows down the convergence rate. One evidence is that most works for low precision training keep an fp32 copy of the model during training, which in turn imposes memory requirements on edge devices. In this work we propose Adaptive Precision Training. It is able to save both total training energy cost and memory usage at the same time. We use model of the same precision for both forward and backward pass in order to reduce memory usage for training. Through evaluating the progress of training, APT allocates layer-wise precision dynamically so that the model learns quicker for longer time. APT provides an application specific hyper-parameter for users to play trade-off between training energy cost, memory usage and accuracy. Experiment shows that APT achieves more than 50% saving on training energy and memory usage with limited accuracy loss. 20% more savings of training energy and memory usage can be achieved in return for a 1% sacrifice in accuracy loss.\",\"PeriodicalId\":158630,\"journal\":{\"name\":\"2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDCS47774.2020.00185\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS47774.2020.00185","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

原位学习是边缘人工智能的发展趋势。在边缘设备上训练深度神经网络(DNN)具有挑战性，因为能量和内存都受到限制。低精度训练有助于减少单个训练迭代的能量成本，但这并不一定转化为整个训练过程的能量节约，因为低精度可能会减慢收敛速度。一个证据是，大多数用于低精度训练的工作在训练期间保留了模型的fp32副本，这反过来又对边缘设备施加了内存要求。在这项工作中，我们提出自适应精确训练。它能够同时节省总训练能量成本和内存使用。我们对前向和后向传递使用相同精度的模型，以减少训练时的内存使用。APT通过对训练进度的评估，动态分配分层精度，使模型学习速度更快，学习时间更长。APT为用户提供了一个特定于应用的超参数，在训练能量成本、内存使用和准确性之间进行权衡。实验表明，APT在精度损失有限的情况下，节省了50%以上的训练能量和内存使用。以1%的准确度损失为代价，可以节省20%的训练能量和内存使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Adaptive Precision Training for Resource Constrained Devices

Learn in-situ is a growing trend for Edge AI. Training deep neural network (DNN) on edge devices is challenging because both energy and memory are constrained. Low precision training helps to reduce the energy cost of a single training iteration, but that does not necessarily translate to energy savings for the whole training process, because low precision could slows down the convergence rate. One evidence is that most works for low precision training keep an fp32 copy of the model during training, which in turn imposes memory requirements on edge devices. In this work we propose Adaptive Precision Training. It is able to save both total training energy cost and memory usage at the same time. We use model of the same precision for both forward and backward pass in order to reduce memory usage for training. Through evaluating the progress of training, APT allocates layer-wise precision dynamically so that the model learns quicker for longer time. APT provides an application specific hyper-parameter for users to play trade-off between training energy cost, memory usage and accuracy. Experiment shows that APT achieves more than 50% saving on training energy and memory usage with limited accuracy loss. 20% more savings of training energy and memory usage can be achieved in return for a 1% sacrifice in accuracy loss.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)

自引率

0.00%

发文量