{"title":"基于GPU的Viterbi算法在线语音解码优化策略","authors":"Alfonsus Raditya Arsadjaja, Achmad Imam Kistijantoro","doi":"10.1109/ICAICTA.2018.8541343","DOIUrl":null,"url":null,"abstract":"Automatic Speech Recognition (ASR) has been popular recently. But the current algorithm for speech recognition is slow and needed the way to recognize faster. One way to achieve it is with GPU, which provides parallel computation; but ASR is hard to parallelize directly.This paper describes how to build parallel ASR system, which requires several steps. First, we must convert the data structure to make it compatible with GPU, then we have to make several kernels that equivalent to the serial algorithm in CPU.We will describe several optimization strategies for make ASR run much faster after we got the correct GPU program. Those strategies are based on profiling result and analysis of the GPU program execution flow.Best implementation that we had have a speedup around 5.59-6.18 times from the serial CPU implementation.","PeriodicalId":184882,"journal":{"name":"2018 5th International Conference on Advanced Informatics: Concept Theory and Applications (ICAICTA)","volume":"148 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Online Speech Decoding Optimization Strategy with Viterbi Algorithm on GPU\",\"authors\":\"Alfonsus Raditya Arsadjaja, Achmad Imam Kistijantoro\",\"doi\":\"10.1109/ICAICTA.2018.8541343\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic Speech Recognition (ASR) has been popular recently. But the current algorithm for speech recognition is slow and needed the way to recognize faster. One way to achieve it is with GPU, which provides parallel computation; but ASR is hard to parallelize directly.This paper describes how to build parallel ASR system, which requires several steps. First, we must convert the data structure to make it compatible with GPU, then we have to make several kernels that equivalent to the serial algorithm in CPU.We will describe several optimization strategies for make ASR run much faster after we got the correct GPU program. Those strategies are based on profiling result and analysis of the GPU program execution flow.Best implementation that we had have a speedup around 5.59-6.18 times from the serial CPU implementation.\",\"PeriodicalId\":184882,\"journal\":{\"name\":\"2018 5th International Conference on Advanced Informatics: Concept Theory and Applications (ICAICTA)\",\"volume\":\"148 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 5th International Conference on Advanced Informatics: Concept Theory and Applications (ICAICTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAICTA.2018.8541343\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 5th International Conference on Advanced Informatics: Concept Theory and Applications (ICAICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICTA.2018.8541343","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Online Speech Decoding Optimization Strategy with Viterbi Algorithm on GPU
Automatic Speech Recognition (ASR) has been popular recently. But the current algorithm for speech recognition is slow and needed the way to recognize faster. One way to achieve it is with GPU, which provides parallel computation; but ASR is hard to parallelize directly.This paper describes how to build parallel ASR system, which requires several steps. First, we must convert the data structure to make it compatible with GPU, then we have to make several kernels that equivalent to the serial algorithm in CPU.We will describe several optimization strategies for make ASR run much faster after we got the correct GPU program. Those strategies are based on profiling result and analysis of the GPU program execution flow.Best implementation that we had have a speedup around 5.59-6.18 times from the serial CPU implementation.