TNP:迈向弹性训练的一步

2023 International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan) Pub Date : 2023-07-17 DOI:10.1109/ICCE-Taiwan58799.2023.10226742

Li-Chung Yeng, Wei-Tsong Lee, Hsin-Wen Wei

{"title":"TNP:迈向弹性训练的一步","authors":"Li-Chung Yeng, Wei-Tsong Lee, Hsin-Wen Wei","doi":"10.1109/ICCE-Taiwan58799.2023.10226742","DOIUrl":null,"url":null,"abstract":"With machine learning models continuously growing in size and short release cycles of GPUs, hardware becomes outdated very soon. To cope with the ever-growing model sizes, we seek out ways to better utilize the computing power we already possess. This paper implements a makespan-aware distributed training framework called Train ‘N’ Play (TNP) to make training on large models and large datasets possible for systems that originally could not accomplish.","PeriodicalId":112903,"journal":{"name":"2023 International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TNP: A Step Towards Elastic Training\",\"authors\":\"Li-Chung Yeng, Wei-Tsong Lee, Hsin-Wen Wei\",\"doi\":\"10.1109/ICCE-Taiwan58799.2023.10226742\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With machine learning models continuously growing in size and short release cycles of GPUs, hardware becomes outdated very soon. To cope with the ever-growing model sizes, we seek out ways to better utilize the computing power we already possess. This paper implements a makespan-aware distributed training framework called Train ‘N’ Play (TNP) to make training on large models and large datasets possible for systems that originally could not accomplish.\",\"PeriodicalId\":112903,\"journal\":{\"name\":\"2023 International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCE-Taiwan58799.2023.10226742\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCE-Taiwan58799.2023.10226742","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着机器学习模型的规模不断扩大和gpu的发布周期缩短，硬件很快就会过时。为了应对不断增长的模型尺寸，我们寻求更好地利用我们已经拥有的计算能力的方法。本文实现了一个可感知最大时间跨度的分布式训练框架，称为“训练N”Play (TNP)，使原本无法完成的系统在大模型和大数据集上的训练成为可能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

TNP: A Step Towards Elastic Training

With machine learning models continuously growing in size and short release cycles of GPUs, hardware becomes outdated very soon. To cope with the ever-growing model sizes, we seek out ways to better utilize the computing power we already possess. This paper implements a makespan-aware distributed training framework called Train ‘N’ Play (TNP) to make training on large models and large datasets possible for systems that originally could not accomplish.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan)

自引率

0.00%

发文量