{"title":"Abstract of the Keynotes","authors":"I. Dhillon","doi":"10.1109/ic3.2019.8844903","DOIUrl":null,"url":null,"abstract":"Despite having remarkable performance on many sequence learning tasks, recurrent neural networks (RNNs) are hard to train with long sequences due to limited expressive power and the vanishing and exploding gradient issues. Previous work has focused on stabilizing the gradients by encouraging orthogonality of weight matrices via re-parameterization techniques. However, two major issues remain in these methods. First, the re-parameterization often relies on a chain of operations on small matrices or vectors that are not friendly to hardware accelerators. As a result, it becomes a source of performance bottleneck for training. Second, these methods fix the singular values of the transition matrix throughout the temporal dimension, which further restricts the expressive power of the model and wastes the potential of encoding useful information into the singular values. In this talk, I will present the Singular Value Gated RNN that can efficiently encode temporal information into singular values, as well as mitigate the vanishing and exploding gradient problems. In addition, we can design novel forward and backward propagation algorithms that are friendly to hardware accelerators. This leads to 3-4 times speedup on GPUs and greatly reduces memory cost. On contemporary applications like voice recognition and text summarization, where long term dependencies are hard to capture, the proposed method outperforms other recurrent models with similar or smaller model sizes. Joint work with Jiong Zhang of UT Austin","PeriodicalId":72026,"journal":{"name":"... International Conference on Contemporary Computing. IC3 (Conference)","volume":"1 1","pages":"1-2"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"... International Conference on Contemporary Computing. IC3 (Conference)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ic3.2019.8844903","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Despite having remarkable performance on many sequence learning tasks, recurrent neural networks (RNNs) are hard to train with long sequences due to limited expressive power and the vanishing and exploding gradient issues. Previous work has focused on stabilizing the gradients by encouraging orthogonality of weight matrices via re-parameterization techniques. However, two major issues remain in these methods. First, the re-parameterization often relies on a chain of operations on small matrices or vectors that are not friendly to hardware accelerators. As a result, it becomes a source of performance bottleneck for training. Second, these methods fix the singular values of the transition matrix throughout the temporal dimension, which further restricts the expressive power of the model and wastes the potential of encoding useful information into the singular values. In this talk, I will present the Singular Value Gated RNN that can efficiently encode temporal information into singular values, as well as mitigate the vanishing and exploding gradient problems. In addition, we can design novel forward and backward propagation algorithms that are friendly to hardware accelerators. This leads to 3-4 times speedup on GPUs and greatly reduces memory cost. On contemporary applications like voice recognition and text summarization, where long term dependencies are hard to capture, the proposed method outperforms other recurrent models with similar or smaller model sizes. Joint work with Jiong Zhang of UT Austin