Abstract of the Keynotes

I. Dhillon
{"title":"Abstract of the Keynotes","authors":"I. Dhillon","doi":"10.1109/ic3.2019.8844903","DOIUrl":null,"url":null,"abstract":"Despite having remarkable performance on many sequence learning tasks, recurrent neural networks (RNNs) are hard to train with long sequences due to limited expressive power and the vanishing and exploding gradient issues. Previous work has focused on stabilizing the gradients by encouraging orthogonality of weight matrices via re-parameterization techniques. However, two major issues remain in these methods. First, the re-parameterization often relies on a chain of operations on small matrices or vectors that are not friendly to hardware accelerators. As a result, it becomes a source of performance bottleneck for training. Second, these methods fix the singular values of the transition matrix throughout the temporal dimension, which further restricts the expressive power of the model and wastes the potential of encoding useful information into the singular values. In this talk, I will present the Singular Value Gated RNN that can efficiently encode temporal information into singular values, as well as mitigate the vanishing and exploding gradient problems. In addition, we can design novel forward and backward propagation algorithms that are friendly to hardware accelerators. This leads to 3-4 times speedup on GPUs and greatly reduces memory cost. On contemporary applications like voice recognition and text summarization, where long term dependencies are hard to capture, the proposed method outperforms other recurrent models with similar or smaller model sizes. Joint work with Jiong Zhang of UT Austin","PeriodicalId":72026,"journal":{"name":"... International Conference on Contemporary Computing. IC3 (Conference)","volume":"1 1","pages":"1-2"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"... International Conference on Contemporary Computing. IC3 (Conference)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ic3.2019.8844903","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Despite having remarkable performance on many sequence learning tasks, recurrent neural networks (RNNs) are hard to train with long sequences due to limited expressive power and the vanishing and exploding gradient issues. Previous work has focused on stabilizing the gradients by encouraging orthogonality of weight matrices via re-parameterization techniques. However, two major issues remain in these methods. First, the re-parameterization often relies on a chain of operations on small matrices or vectors that are not friendly to hardware accelerators. As a result, it becomes a source of performance bottleneck for training. Second, these methods fix the singular values of the transition matrix throughout the temporal dimension, which further restricts the expressive power of the model and wastes the potential of encoding useful information into the singular values. In this talk, I will present the Singular Value Gated RNN that can efficiently encode temporal information into singular values, as well as mitigate the vanishing and exploding gradient problems. In addition, we can design novel forward and backward propagation algorithms that are friendly to hardware accelerators. This leads to 3-4 times speedup on GPUs and greatly reduces memory cost. On contemporary applications like voice recognition and text summarization, where long term dependencies are hard to capture, the proposed method outperforms other recurrent models with similar or smaller model sizes. Joint work with Jiong Zhang of UT Austin
主题演讲摘要
尽管递归神经网络(RNNs)在许多序列学习任务中表现出色,但由于表达能力有限以及梯度消失和爆炸问题,难以对长序列进行训练。以前的工作主要集中在通过重参数化技术鼓励权矩阵的正交性来稳定梯度。然而,这些方法仍然存在两个主要问题。首先,重新参数化通常依赖于对硬件加速器不友好的小矩阵或向量的一系列操作。因此,它成为培训绩效瓶颈的来源。其次,这些方法在整个时间维度上固定了转移矩阵的奇异值,这进一步限制了模型的表达能力,浪费了将有用信息编码到奇异值中的潜力。在这次演讲中,我将介绍奇异值门控RNN,它可以有效地将时间信息编码为奇异值,并减轻梯度消失和爆炸的问题。此外,我们还可以设计对硬件加速器友好的新颖的前向和后向传播算法。这使得gpu的速度提高了3-4倍,并大大降低了内存成本。在语音识别和文本摘要等难以捕获长期依赖关系的当代应用中,所提出的方法优于其他具有相似或更小模型大小的循环模型。与德州大学奥斯汀分校的张炯合作
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信