GrOD: Deep Learning with Gradients Orthogonal Decomposition for Knowledge Transfer, Distillation, and Adversarial Training

Haoyi Xiong, Ruosi Wan, Jian Zhao, Zeyu Chen, Xingjian Li, Zhanxing Zhu, Jun Huan
{"title":"GrOD: Deep Learning with Gradients Orthogonal Decomposition for Knowledge Transfer, Distillation, and Adversarial Training","authors":"Haoyi Xiong, Ruosi Wan, Jian Zhao, Zeyu Chen, Xingjian Li, Zhanxing Zhu, Jun Huan","doi":"10.1145/3530836","DOIUrl":null,"url":null,"abstract":"Regularization that incorporates the linear combination of empirical loss and explicit regularization terms as the loss function has been frequently used for many machine learning tasks. The explicit regularization term is designed in different types, depending on its applications. While regularized learning often boost the performance with higher accuracy and faster convergence, the regularization would sometimes hurt the empirical loss minimization and lead to poor performance. To deal with such issues in this work, we propose a novel strategy, namely Gradients Orthogonal Decomposition (GrOD), that improves the training procedure of regularized deep learning. Instead of linearly combining gradients of the two terms, GrOD re-estimates a new direction for iteration that does not hurt the empirical loss minimization while preserving the regularization affects, through orthogonal decomposition. We have performed extensive experiments to use GrOD improving the commonly used algorithms of transfer learning [2], knowledge distillation [3], and adversarial learning [4]. The experiment results based on large datasets, including Caltech 256 [5], MIT indoor 67 [6], CIFAR-10 [7], and ImageNet [8], show significant improvement made by GrOD for all three algorithms in all cases.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Knowledge Discovery from Data (TKDD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3530836","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Regularization that incorporates the linear combination of empirical loss and explicit regularization terms as the loss function has been frequently used for many machine learning tasks. The explicit regularization term is designed in different types, depending on its applications. While regularized learning often boost the performance with higher accuracy and faster convergence, the regularization would sometimes hurt the empirical loss minimization and lead to poor performance. To deal with such issues in this work, we propose a novel strategy, namely Gradients Orthogonal Decomposition (GrOD), that improves the training procedure of regularized deep learning. Instead of linearly combining gradients of the two terms, GrOD re-estimates a new direction for iteration that does not hurt the empirical loss minimization while preserving the regularization affects, through orthogonal decomposition. We have performed extensive experiments to use GrOD improving the commonly used algorithms of transfer learning [2], knowledge distillation [3], and adversarial learning [4]. The experiment results based on large datasets, including Caltech 256 [5], MIT indoor 67 [6], CIFAR-10 [7], and ImageNet [8], show significant improvement made by GrOD for all three algorithms in all cases.
基于梯度正交分解的深度学习,用于知识转移、蒸馏和对抗训练
将经验损失和显式正则化项的线性组合作为损失函数的正则化已经经常用于许多机器学习任务。根据不同的应用,显式正则化项被设计成不同的类型。虽然正则化学习通常以更高的精度和更快的收敛速度提高性能,但正则化有时会损害经验损失最小化并导致性能下降。为了解决这些问题,我们提出了一种新的策略,即梯度正交分解(GrOD),它改进了正则化深度学习的训练过程。GrOD不是线性组合两项的梯度,而是通过正交分解重新估计迭代的新方向,在不损害经验损失最小化的同时保留正则化影响。我们已经进行了大量的实验,使用GrOD改进迁移学习[2]、知识蒸馏[3]和对抗学习[4]等常用算法。基于Caltech 256[5]、MIT indoor 67[6]、CIFAR-10[7]和ImageNet[8]等大型数据集的实验结果表明,在所有情况下,GrOD对这三种算法都有显著的改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信