K. Sim, A. Narayanan, Tom Bagby, Tara N. Sainath, M. Bacchiani
{"title":"在TensorFlow中使用批处理计算提高前向向后算法的效率","authors":"K. Sim, A. Narayanan, Tom Bagby, Tara N. Sainath, M. Bacchiani","doi":"10.1109/ASRU.2017.8268944","DOIUrl":null,"url":null,"abstract":"Sequence-level losses are commonly used to train deep neural network acoustic models for automatic speech recognition. The forward-backward algorithm is used to efficiently compute the gradients of the sequence loss with respect to the model parameters. Gradient-based optimization is used to minimize these losses. Recent work has shown that the forward-backward algorithm can be efficiently implemented as a series of matrix operations. This paper further improves the forward-backward algorithm via batched computation, a technique commonly used to improve training speed by exploiting the parallel computation of matrix multiplication. Specifically, we show how batched computation of the forward-backward algorithm can be efficiently implemented using TensorFlow to handle variable-length sequences within a mini batch. Furthermore, we also show how the batched forward-backward computation can be used to compute the gradients of the connectionist temporal classification (CTC) and maximum mutual information (MMI) losses with respect to the logits. We show, via empirical benchmarks, that the batched forward-backward computation can speed up the CTC loss and gradient computation by about 183 times when run on GPU with a batch size of 256 compared to using a batch size of 1; and by about 22 times for lattice-free MMI using a trigram phone language model for the denominator.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"181 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Improving the efficiency of forward-backward algorithm using batched computation in TensorFlow\",\"authors\":\"K. Sim, A. Narayanan, Tom Bagby, Tara N. Sainath, M. Bacchiani\",\"doi\":\"10.1109/ASRU.2017.8268944\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sequence-level losses are commonly used to train deep neural network acoustic models for automatic speech recognition. The forward-backward algorithm is used to efficiently compute the gradients of the sequence loss with respect to the model parameters. Gradient-based optimization is used to minimize these losses. Recent work has shown that the forward-backward algorithm can be efficiently implemented as a series of matrix operations. This paper further improves the forward-backward algorithm via batched computation, a technique commonly used to improve training speed by exploiting the parallel computation of matrix multiplication. Specifically, we show how batched computation of the forward-backward algorithm can be efficiently implemented using TensorFlow to handle variable-length sequences within a mini batch. Furthermore, we also show how the batched forward-backward computation can be used to compute the gradients of the connectionist temporal classification (CTC) and maximum mutual information (MMI) losses with respect to the logits. We show, via empirical benchmarks, that the batched forward-backward computation can speed up the CTC loss and gradient computation by about 183 times when run on GPU with a batch size of 256 compared to using a batch size of 1; and by about 22 times for lattice-free MMI using a trigram phone language model for the denominator.\",\"PeriodicalId\":290868,\"journal\":{\"name\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"181 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2017.8268944\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2017.8268944","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improving the efficiency of forward-backward algorithm using batched computation in TensorFlow
Sequence-level losses are commonly used to train deep neural network acoustic models for automatic speech recognition. The forward-backward algorithm is used to efficiently compute the gradients of the sequence loss with respect to the model parameters. Gradient-based optimization is used to minimize these losses. Recent work has shown that the forward-backward algorithm can be efficiently implemented as a series of matrix operations. This paper further improves the forward-backward algorithm via batched computation, a technique commonly used to improve training speed by exploiting the parallel computation of matrix multiplication. Specifically, we show how batched computation of the forward-backward algorithm can be efficiently implemented using TensorFlow to handle variable-length sequences within a mini batch. Furthermore, we also show how the batched forward-backward computation can be used to compute the gradients of the connectionist temporal classification (CTC) and maximum mutual information (MMI) losses with respect to the logits. We show, via empirical benchmarks, that the batched forward-backward computation can speed up the CTC loss and gradient computation by about 183 times when run on GPU with a batch size of 256 compared to using a batch size of 1; and by about 22 times for lattice-free MMI using a trigram phone language model for the denominator.