H. Chien, Javier Turek, Nicole M. Beckage, Vy A. Vo, C. Honey, Ted L. Willke
{"title":"Long Short-Term Memory with Slower Information Decay","authors":"H. Chien, Javier Turek, Nicole M. Beckage, Vy A. Vo, C. Honey, Ted L. Willke","doi":"10.52591/2021072418","DOIUrl":null,"url":null,"abstract":"Learning to process long-range dependencies has been a challenge for recurrent neural networks. Despite improvements achieved by long shortterm memory (LSTMs), its gating mechanism results in exponential decay of information, limiting their capacity of capturing long-range dependencies. In this work, we present a power law forget gate, which instead has a slower rate of information decay. We propose a power law-based LSTM (pLSTM) based on the LSTM but with a power law forget gate. We test empirically the pLSTM on the copy task, sentiment classification, and sequential MNIST, all with long-range dependency tasks. The pLSTM solves these tasks outperforming an LSTM, specially for long-range dependencies. Further, the pLSTM learns sparser and more robust representations.","PeriodicalId":196347,"journal":{"name":"LatinX in AI at International Conference on Machine Learning 2021","volume":"183 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"LatinX in AI at International Conference on Machine Learning 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.52591/2021072418","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Learning to process long-range dependencies has been a challenge for recurrent neural networks. Despite improvements achieved by long shortterm memory (LSTMs), its gating mechanism results in exponential decay of information, limiting their capacity of capturing long-range dependencies. In this work, we present a power law forget gate, which instead has a slower rate of information decay. We propose a power law-based LSTM (pLSTM) based on the LSTM but with a power law forget gate. We test empirically the pLSTM on the copy task, sentiment classification, and sequential MNIST, all with long-range dependency tasks. The pLSTM solves these tasks outperforming an LSTM, specially for long-range dependencies. Further, the pLSTM learns sparser and more robust representations.