Randomized K-FACs: Speeding up K-FAC with Randomized Numerical Linear Algebra

Ideal Pub Date : 2022-06-30 DOI:10.48550/arXiv.2206.15397
C. Puiu
{"title":"Randomized K-FACs: Speeding up K-FAC with Randomized Numerical Linear Algebra","authors":"C. Puiu","doi":"10.48550/arXiv.2206.15397","DOIUrl":null,"url":null,"abstract":"K-FAC is a successful tractable implementation of Natural Gradient for Deep Learning, which nevertheless suffers from the requirement to compute the inverse of the Kronecker factors (through an eigen-decomposition). This can be very time-consuming (or even prohibitive) when these factors are large. In this paper, we theoretically show that, owing to the exponential-average construction paradigm of the Kronecker factors that is typically used, their eigen-spectrum must decay. We show numerically that in practice this decay is very rapid, leading to the idea that we could save substantial computation by only focusing on the first few eigen-modes when inverting the Kronecker-factors. Importantly, the spectrum decay happens over a constant number of modes irrespectively of the layer width. This allows us to reduce the time complexity of K-FAC from cubic to quadratic in layer width, partially closing the gap w.r.t. SENG (another practical Natural Gradient implementation for Deep learning which scales linearly in width). Randomized Numerical Linear Algebra provides us with the necessary tools to do so. Numerical results show we obtain $\\approx2.5\\times$ reduction in per-epoch time and $\\approx3.3\\times$ reduction in time to target accuracy. We compare our proposed K-FAC sped-up versions SENG, and observe that for CIFAR10 classification with VGG16_bn we perform on par with it.","PeriodicalId":113317,"journal":{"name":"Ideal","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ideal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2206.15397","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

K-FAC is a successful tractable implementation of Natural Gradient for Deep Learning, which nevertheless suffers from the requirement to compute the inverse of the Kronecker factors (through an eigen-decomposition). This can be very time-consuming (or even prohibitive) when these factors are large. In this paper, we theoretically show that, owing to the exponential-average construction paradigm of the Kronecker factors that is typically used, their eigen-spectrum must decay. We show numerically that in practice this decay is very rapid, leading to the idea that we could save substantial computation by only focusing on the first few eigen-modes when inverting the Kronecker-factors. Importantly, the spectrum decay happens over a constant number of modes irrespectively of the layer width. This allows us to reduce the time complexity of K-FAC from cubic to quadratic in layer width, partially closing the gap w.r.t. SENG (another practical Natural Gradient implementation for Deep learning which scales linearly in width). Randomized Numerical Linear Algebra provides us with the necessary tools to do so. Numerical results show we obtain $\approx2.5\times$ reduction in per-epoch time and $\approx3.3\times$ reduction in time to target accuracy. We compare our proposed K-FAC sped-up versions SENG, and observe that for CIFAR10 classification with VGG16_bn we perform on par with it.
随机化K-FAC:用随机化数值线性代数加速K-FAC
K-FAC是深度学习自然梯度的一个成功的可处理的实现,尽管如此,它仍然需要计算Kronecker因子的逆(通过特征分解)。当这些因素很大时,这可能非常耗时(甚至令人望而却步)。在本文中,我们从理论上表明,由于通常使用的克罗内克因子的指数-平均构造范式,它们的特征谱必须衰减。我们在数值上表明,在实践中,这种衰减是非常迅速的,这导致了我们在反演克罗内克因子时只关注前几个本征模式可以节省大量计算的想法。重要的是,光谱衰减发生在恒定数量的模式上,与层宽无关。这使我们能够将K-FAC的时间复杂度从三次层宽度降低到二次层宽度,部分缩小了w.r.t. SENG(另一种用于深度学习的实用自然梯度实现,其宽度线性缩放)的差距。随机数值线性代数为我们提供了这样做的必要工具。数值结果表明,每历元时间减少$\约2.5\倍,达到目标精度的时间减少$\约3.3\倍。我们比较了我们提出的K-FAC加速版本的SENG,并观察到对于使用VGG16_bn的CIFAR10分类,我们的表现与它相当。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信