MixTrain: accelerating DNN training via input mixing.

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Artificial Intelligence Pub Date : 2024-09-04 eCollection Date: 2024-01-01 DOI:10.3389/frai.2024.1387936

Sarada Krithivasan, Sanchari Sen, Swagath Venkataramani, Anand Raghunathan

{"title":"MixTrain: accelerating DNN training via input mixing.","authors":"Sarada Krithivasan, Sanchari Sen, Swagath Venkataramani, Anand Raghunathan","doi":"10.3389/frai.2024.1387936","DOIUrl":null,"url":null,"abstract":"<p><p>Training Deep Neural Networks (DNNs) places immense compute requirements on the underlying hardware platforms, expending large amounts of time and energy. An important factor contributing to the long training times is the increasing dataset complexity required to reach state-of-the-art performance in real-world applications. To address this challenge, we explore the use of input mixing, where multiple inputs are combined into a single composite input with an associated composite label for training. The goal is for training on the mixed input to achieve a similar effect as training separately on each the constituent inputs that it represents. This results in a lower number of inputs (or mini-batches) to be processed in each epoch, proportionally reducing training time. We find that naive input mixing leads to a considerable drop in learning performance and model accuracy due to interference between the forward/backward propagation of the mixed inputs. We propose two strategies to address this challenge and realize training speedups from input mixing with minimal impact on accuracy. First, we reduce the impact of inter-input interference by exploiting the spatial separation between the features of the constituent inputs in the network's intermediate representations. We also adaptively vary the mixing ratio of constituent inputs based on their loss in previous epochs. Second, we propose heuristics to automatically identify the subset of the training dataset that is subject to mixing in each epoch. Across ResNets of varying depth, MobileNetV2 and two Vision Transformer networks, we obtain upto 1.6 × and 1.8 × speedups in training for the ImageNet and Cifar10 datasets, respectively, on an Nvidia RTX 2080Ti GPU, with negligible loss in classification accuracy.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1387936"},"PeriodicalIF":3.0000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11443600/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2024.1387936","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Training Deep Neural Networks (DNNs) places immense compute requirements on the underlying hardware platforms, expending large amounts of time and energy. An important factor contributing to the long training times is the increasing dataset complexity required to reach state-of-the-art performance in real-world applications. To address this challenge, we explore the use of input mixing, where multiple inputs are combined into a single composite input with an associated composite label for training. The goal is for training on the mixed input to achieve a similar effect as training separately on each the constituent inputs that it represents. This results in a lower number of inputs (or mini-batches) to be processed in each epoch, proportionally reducing training time. We find that naive input mixing leads to a considerable drop in learning performance and model accuracy due to interference between the forward/backward propagation of the mixed inputs. We propose two strategies to address this challenge and realize training speedups from input mixing with minimal impact on accuracy. First, we reduce the impact of inter-input interference by exploiting the spatial separation between the features of the constituent inputs in the network's intermediate representations. We also adaptively vary the mixing ratio of constituent inputs based on their loss in previous epochs. Second, we propose heuristics to automatically identify the subset of the training dataset that is subject to mixing in each epoch. Across ResNets of varying depth, MobileNetV2 and two Vision Transformer networks, we obtain upto 1.6 × and 1.8 × speedups in training for the ImageNet and Cifar10 datasets, respectively, on an Nvidia RTX 2080Ti GPU, with negligible loss in classification accuracy.

查看原文本刊更多论文

MixTrain：通过输入混合加速 DNN 训练。

深度神经网络（DNN）的训练对底层硬件平台的计算能力要求极高，需要耗费大量的时间和精力。导致训练时间过长的一个重要因素是，要在现实世界的应用中达到最先进的性能，所需的数据集复杂度越来越高。为了应对这一挑战，我们探索了使用输入混合的方法，即将多个输入合并为一个单一的复合输入，并带有相关的复合标签进行训练。我们的目标是在混合输入上进行训练，以达到与在混合输入所代表的每个组成输入上分别进行训练类似的效果。这样一来，每个历时中需要处理的输入（或迷你批次）数量就会减少，从而相应地缩短了训练时间。我们发现，由于混合输入的前向/后向传播之间存在干扰，天真的输入混合会导致学习性能和模型准确性大幅下降。我们提出了两种策略来应对这一挑战，并在对准确性影响最小的情况下通过输入混合实现训练加速。首先，我们利用网络中间表征中各组成输入特征之间的空间分隔来减少输入间干扰的影响。我们还根据组成输入在之前历时中的损失，自适应地改变其混合比例。其次，我们提出了启发式方法，以自动识别每个历时中需要混合的训练数据集子集。通过不同深度的 ResNets、MobileNetV2 和两个 Vision Transformer 网络，我们在 Nvidia RTX 2080Ti GPU 上对 ImageNet 和 Cifar10 数据集的训练速度分别提高了 1.6 倍和 1.8 倍，而分类准确性的损失几乎可以忽略不计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Frontiers in Artificial Intelligence Multiple-

CiteScore

6.10

自引率

2.50%

发文量

272

审稿时长

13 weeks