Dynamic scaling for low-precision learning

Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2021-02-17 DOI:10.1145/3437801.3441624

Ruobing Han, Min Si, J. Demmel, Yang You

{"title":"Dynamic scaling for low-precision learning","authors":"Ruobing Han, Min Si, J. Demmel, Yang You","doi":"10.1145/3437801.3441624","DOIUrl":null,"url":null,"abstract":"In recent years, distributed deep learning is becoming popular in industry and academia. Although researchers want to use distributed systems for training, it has been reported that the communication cost for synchronizing gradients can be a bottleneck. Using low-precision gradients is a promising technique for reducing the bandwidth requirement. In this work, we propose Auto Precision Scaling (APS), an algorithm that can improve the accuracy when we communicate gradients by low-precision floating-point values. APS can improve the accuracy for all precisions with a trivial communication cost. Our experimental results show that for both image classification and segmentation, applying APS can train the state-of-the-art models by 8-bit floating-point gradients with no or only a tiny accuracy loss (<0.05%). Furthermore, we can avoid any accuracy loss by designing a hybrid-precision technique. Finally, we propose a performance model to evaluate the proposed method. Our experimental results show that APS can get a significant speedup over the state-of-the-art method. To make it available to researchers and developers, we design and implement a high-performance system for customized precision Deep Learning(CPD), which can simulate the training process using an arbitrary low-precision customized floating-point format. We integrate CPD into PyTorch and make it open-source to the public1.","PeriodicalId":124852,"journal":{"name":"Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","volume":"217 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3437801.3441624","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

In recent years, distributed deep learning is becoming popular in industry and academia. Although researchers want to use distributed systems for training, it has been reported that the communication cost for synchronizing gradients can be a bottleneck. Using low-precision gradients is a promising technique for reducing the bandwidth requirement. In this work, we propose Auto Precision Scaling (APS), an algorithm that can improve the accuracy when we communicate gradients by low-precision floating-point values. APS can improve the accuracy for all precisions with a trivial communication cost. Our experimental results show that for both image classification and segmentation, applying APS can train the state-of-the-art models by 8-bit floating-point gradients with no or only a tiny accuracy loss (<0.05%). Furthermore, we can avoid any accuracy loss by designing a hybrid-precision technique. Finally, we propose a performance model to evaluate the proposed method. Our experimental results show that APS can get a significant speedup over the state-of-the-art method. To make it available to researchers and developers, we design and implement a high-performance system for customized precision Deep Learning(CPD), which can simulate the training process using an arbitrary low-precision customized floating-point format. We integrate CPD into PyTorch and make it open-source to the public1.

查看原文本刊更多论文

低精度学习的动态缩放

近年来，分布式深度学习在工业界和学术界越来越流行。尽管研究人员希望使用分布式系统进行训练，但据报道，同步梯度的通信成本可能是一个瓶颈。使用低精度梯度是一种很有前途的降低带宽需求的技术。在这项工作中，我们提出了自动精确缩放(APS)算法，当我们通过低精度浮点值通信梯度时，该算法可以提高精度。APS可以以很小的通信成本提高所有精度的精度。我们的实验结果表明，对于图像分类和分割，应用APS可以通过8位浮点梯度训练最先进的模型，而没有或只有很小的精度损失(<0.05%)。此外，通过设计混合精度技术可以避免任何精度损失。最后，我们提出了一个性能模型来评估所提出的方法。我们的实验结果表明，APS比最先进的方法可以获得显着的加速。为了使研究人员和开发人员能够使用它，我们设计并实现了一个用于定制精度深度学习(CPD)的高性能系统，该系统可以使用任意低精度定制浮点格式模拟训练过程。我们将CPD集成到PyTorch中，并将其开放给公众1。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

自引率

0.00%

发文量