GPT训练的内存激活压缩

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2023-06-11 DOI:10.1109/AICAS57966.2023.10168658

Seungyong Lee, Geonu Yun, Hyuk-Jae Lee

{"title":"GPT训练的内存激活压缩","authors":"Seungyong Lee, Geonu Yun, Hyuk-Jae Lee","doi":"10.1109/AICAS57966.2023.10168658","DOIUrl":null,"url":null,"abstract":"Recently, a large number of parameters in Transformer-based language models have caused memory short-ages during training. Although solutions such as mixed precision and model parallelism have been proposed, they have the limitation of inducing communication overhead and requiring modification of the model by a programmer. To address this issue, we propose a scheme that compresses activation data in memory, enabling the reduction of memory usage during training in a user-transparent manner. The compression algorithm gathers activation data into a block and compresses it, using base-delta compression for the exponent and bit-plane zero compression for the sign and mantissa. Then, the important bits are arranged in order, and LSB truncation is applied to fit the target size. The proposed compression algorithm achieves a compression ratio of 2.09 for the sign, 2.04 for the exponent, and 1.21 for the mantissa. A compression ratio of 3.2 is obtained by applying up to the truncation, and we confirm the convergence of GPT-2 training with the compression.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"509 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"In-memory Activation Compression for GPT Training\",\"authors\":\"Seungyong Lee, Geonu Yun, Hyuk-Jae Lee\",\"doi\":\"10.1109/AICAS57966.2023.10168658\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, a large number of parameters in Transformer-based language models have caused memory short-ages during training. Although solutions such as mixed precision and model parallelism have been proposed, they have the limitation of inducing communication overhead and requiring modification of the model by a programmer. To address this issue, we propose a scheme that compresses activation data in memory, enabling the reduction of memory usage during training in a user-transparent manner. The compression algorithm gathers activation data into a block and compresses it, using base-delta compression for the exponent and bit-plane zero compression for the sign and mantissa. Then, the important bits are arranged in order, and LSB truncation is applied to fit the target size. The proposed compression algorithm achieves a compression ratio of 2.09 for the sign, 2.04 for the exponent, and 1.21 for the mantissa. A compression ratio of 3.2 is obtained by applying up to the truncation, and we confirm the convergence of GPT-2 training with the compression.\",\"PeriodicalId\":296649,\"journal\":{\"name\":\"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)\",\"volume\":\"509 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AICAS57966.2023.10168658\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICAS57966.2023.10168658","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

近年来，基于transformer的语言模型中大量的参数导致了训练过程中的记忆不足。尽管已经提出了混合精度和模型并行等解决方案，但它们存在导致通信开销和需要程序员修改模型的局限性。为了解决这个问题，我们提出了一个在内存中压缩激活数据的方案，以用户透明的方式减少训练期间的内存使用。压缩算法将激活数据收集到一个块中并对其进行压缩，对指数使用基增量压缩，对符号和尾数使用位平面零压缩。然后，按顺序排列重要位，并采用LSB截断来拟合目标大小。提出的压缩算法实现了符号的压缩比为2.09，指数的压缩比为2.04，尾数的压缩比为1.21。通过对截断的up应用，得到了3.2的压缩比，并证实了GPT-2训练与压缩的收敛性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

In-memory Activation Compression for GPT Training

Recently, a large number of parameters in Transformer-based language models have caused memory short-ages during training. Although solutions such as mixed precision and model parallelism have been proposed, they have the limitation of inducing communication overhead and requiring modification of the model by a programmer. To address this issue, we propose a scheme that compresses activation data in memory, enabling the reduction of memory usage during training in a user-transparent manner. The compression algorithm gathers activation data into a block and compresses it, using base-delta compression for the exponent and bit-plane zero compression for the sign and mantissa. Then, the important bits are arranged in order, and LSB truncation is applied to fit the target size. The proposed compression algorithm achieves a compression ratio of 2.09 for the sign, 2.04 for the exponent, and 1.21 for the mantissa. A compression ratio of 3.2 is obtained by applying up to the truncation, and we confirm the convergence of GPT-2 training with the compression.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)

自引率

0.00%

发文量