Skipping RNN State Updates without Retraining the Original Model

Proceedings of the 1st Workshop on Machine Learning on Edge in Sensor Systems Pub Date : 2019-11-10 DOI:10.1145/3362743.3362965

Jin Tao, Urmish Thakker, Ganesh S. Dasika, Jesse G. Beu

{"title":"Skipping RNN State Updates without Retraining the Original Model","authors":"Jin Tao, Urmish Thakker, Ganesh S. Dasika, Jesse G. Beu","doi":"10.1145/3362743.3362965","DOIUrl":null,"url":null,"abstract":"Recurrent Neural Networks (RNNs) break a time-series input (or a sentence) into multiple time-steps (or words) and process it one time-step (word) at a time. However, not all of these time-steps (words) need to be processed to determine the final output accurately. Prior work has exploited this intuition by incorporating an additional predictor in front of the RNN model to prune time-steps that are not relevant. However, they jointly train the predictor and the RNN model, allowing one to learn from the mistakes of the other. In this work we present a method to skip RNN time-steps without retraining or fine tuning the original RNN model. Using an ideal predictor, we show that even without retraining the original model, we can train a predictor to skip 45% of steps for the SST dataset and 80% of steps for the IMDB dataset without impacting the model accuracy. We show that the decision to skip is not easy by comparing against 5 different baselines based on solutions derived from domain knowledge. Finally, we present a case study about the cost and accuracy benefits of realizing such a predictor. This realistic predictor on the SST dataset is able to reduce the computation by more than 25% with at most 0.3% loss in accuracy while being 40× smaller than the original RNN model.","PeriodicalId":425595,"journal":{"name":"Proceedings of the 1st Workshop on Machine Learning on Edge in Sensor Systems","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st Workshop on Machine Learning on Edge in Sensor Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3362743.3362965","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

Recurrent Neural Networks (RNNs) break a time-series input (or a sentence) into multiple time-steps (or words) and process it one time-step (word) at a time. However, not all of these time-steps (words) need to be processed to determine the final output accurately. Prior work has exploited this intuition by incorporating an additional predictor in front of the RNN model to prune time-steps that are not relevant. However, they jointly train the predictor and the RNN model, allowing one to learn from the mistakes of the other. In this work we present a method to skip RNN time-steps without retraining or fine tuning the original RNN model. Using an ideal predictor, we show that even without retraining the original model, we can train a predictor to skip 45% of steps for the SST dataset and 80% of steps for the IMDB dataset without impacting the model accuracy. We show that the decision to skip is not easy by comparing against 5 different baselines based on solutions derived from domain knowledge. Finally, we present a case study about the cost and accuracy benefits of realizing such a predictor. This realistic predictor on the SST dataset is able to reduce the computation by more than 25% with at most 0.3% loss in accuracy while being 40× smaller than the original RNN model.

查看原文本刊更多论文

跳过RNN状态更新而不重新训练原始模型

递归神经网络(rnn)将一个时间序列输入(或一个句子)分解成多个时间步(或单词)，并一次处理一个时间步(单词)。但是，并非需要处理所有这些时间步长(单词)才能准确地确定最终输出。之前的工作已经利用了这种直觉，在RNN模型前面加入了一个额外的预测器来修剪不相关的时间步长。然而，他们联合训练预测器和RNN模型，允许一方从另一方的错误中学习。在这项工作中，我们提出了一种跳过RNN时间步的方法，而无需重新训练或微调原始RNN模型。使用理想的预测器，我们发现即使不重新训练原始模型，我们也可以训练一个预测器在不影响模型精度的情况下跳过45%的SST数据集和80%的IMDB数据集的步骤。我们通过比较基于从领域知识中得到的解决方案的5个不同基线来表明，跳过的决定并不容易。最后，我们给出了一个关于实现这种预测器的成本和准确性好处的案例研究。在SST数据集上，这种现实的预测器能够减少25%以上的计算，准确度最多损失0.3%，而比原始RNN模型小40倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 1st Workshop on Machine Learning on Edge in Sensor Systems

自引率

0.00%

发文量