Abstract of the Keynotes

... International Conference on Contemporary Computing. IC3 (Conference) Pub Date : 2019-08-01 DOI:10.1109/ic3.2019.8844903

I. Dhillon

{"title":"Abstract of the Keynotes","authors":"I. Dhillon","doi":"10.1109/ic3.2019.8844903","DOIUrl":null,"url":null,"abstract":"Despite having remarkable performance on many sequence learning tasks, recurrent neural networks (RNNs) are hard to train with long sequences due to limited expressive power and the vanishing and exploding gradient issues. Previous work has focused on stabilizing the gradients by encouraging orthogonality of weight matrices via re-parameterization techniques. However, two major issues remain in these methods. First, the re-parameterization often relies on a chain of operations on small matrices or vectors that are not friendly to hardware accelerators. As a result, it becomes a source of performance bottleneck for training. Second, these methods fix the singular values of the transition matrix throughout the temporal dimension, which further restricts the expressive power of the model and wastes the potential of encoding useful information into the singular values. In this talk, I will present the Singular Value Gated RNN that can efficiently encode temporal information into singular values, as well as mitigate the vanishing and exploding gradient problems. In addition, we can design novel forward and backward propagation algorithms that are friendly to hardware accelerators. This leads to 3-4 times speedup on GPUs and greatly reduces memory cost. On contemporary applications like voice recognition and text summarization, where long term dependencies are hard to capture, the proposed method outperforms other recurrent models with similar or smaller model sizes. Joint work with Jiong Zhang of UT Austin","PeriodicalId":72026,"journal":{"name":"... International Conference on Contemporary Computing. IC3 (Conference)","volume":"1 1","pages":"1-2"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"... International Conference on Contemporary Computing. IC3 (Conference)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ic3.2019.8844903","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Despite having remarkable performance on many sequence learning tasks, recurrent neural networks (RNNs) are hard to train with long sequences due to limited expressive power and the vanishing and exploding gradient issues. Previous work has focused on stabilizing the gradients by encouraging orthogonality of weight matrices via re-parameterization techniques. However, two major issues remain in these methods. First, the re-parameterization often relies on a chain of operations on small matrices or vectors that are not friendly to hardware accelerators. As a result, it becomes a source of performance bottleneck for training. Second, these methods fix the singular values of the transition matrix throughout the temporal dimension, which further restricts the expressive power of the model and wastes the potential of encoding useful information into the singular values. In this talk, I will present the Singular Value Gated RNN that can efficiently encode temporal information into singular values, as well as mitigate the vanishing and exploding gradient problems. In addition, we can design novel forward and backward propagation algorithms that are friendly to hardware accelerators. This leads to 3-4 times speedup on GPUs and greatly reduces memory cost. On contemporary applications like voice recognition and text summarization, where long term dependencies are hard to capture, the proposed method outperforms other recurrent models with similar or smaller model sizes. Joint work with Jiong Zhang of UT Austin

查看原文本刊更多论文

主题演讲摘要

尽管递归神经网络(RNNs)在许多序列学习任务中表现出色，但由于表达能力有限以及梯度消失和爆炸问题，难以对长序列进行训练。以前的工作主要集中在通过重参数化技术鼓励权矩阵的正交性来稳定梯度。然而，这些方法仍然存在两个主要问题。首先，重新参数化通常依赖于对硬件加速器不友好的小矩阵或向量的一系列操作。因此，它成为培训绩效瓶颈的来源。其次，这些方法在整个时间维度上固定了转移矩阵的奇异值，这进一步限制了模型的表达能力，浪费了将有用信息编码到奇异值中的潜力。在这次演讲中，我将介绍奇异值门控RNN，它可以有效地将时间信息编码为奇异值，并减轻梯度消失和爆炸的问题。此外，我们还可以设计对硬件加速器友好的新颖的前向和后向传播算法。这使得gpu的速度提高了3-4倍，并大大降低了内存成本。在语音识别和文本摘要等难以捕获长期依赖关系的当代应用中，所提出的方法优于其他具有相似或更小模型大小的循环模型。与德州大学奥斯汀分校的张炯合作

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

... International Conference on Contemporary Computing. IC3 (Conference)

自引率

0.00%

发文量