Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, K. Tokuda
{"title":"基于波形的零延迟无损语音编码","authors":"Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, K. Tokuda","doi":"10.1109/SLT.2018.8639598","DOIUrl":null,"url":null,"abstract":"This paper presents a WaveNet-based zero-delay lossless speech coding technique for high-quality communications. The WaveNet generative model, which is a state-of-the-art model for neural-network-based speech waveform synthesis, is used in both the encoder and decoder. In the encoder, discrete speech signals are losslessly compressed using sample-by-sample entropy coding. The decoder fully reconstructs the original speech signals from the compressed signals without algorithmic delay. Experimental results show that the proposed coding technique can transmit speech audio waveforms with 50% their original bit rate and the WaveNet-based speech coder remains effective for unknown speakers.","PeriodicalId":377307,"journal":{"name":"2018 IEEE Spoken Language Technology Workshop (SLT)","volume":"497 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"WaveNet-Based Zero-Delay Lossless Speech Coding\",\"authors\":\"Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, K. Tokuda\",\"doi\":\"10.1109/SLT.2018.8639598\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a WaveNet-based zero-delay lossless speech coding technique for high-quality communications. The WaveNet generative model, which is a state-of-the-art model for neural-network-based speech waveform synthesis, is used in both the encoder and decoder. In the encoder, discrete speech signals are losslessly compressed using sample-by-sample entropy coding. The decoder fully reconstructs the original speech signals from the compressed signals without algorithmic delay. Experimental results show that the proposed coding technique can transmit speech audio waveforms with 50% their original bit rate and the WaveNet-based speech coder remains effective for unknown speakers.\",\"PeriodicalId\":377307,\"journal\":{\"name\":\"2018 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"497 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT.2018.8639598\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2018.8639598","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
This paper presents a WaveNet-based zero-delay lossless speech coding technique for high-quality communications. The WaveNet generative model, which is a state-of-the-art model for neural-network-based speech waveform synthesis, is used in both the encoder and decoder. In the encoder, discrete speech signals are losslessly compressed using sample-by-sample entropy coding. The decoder fully reconstructs the original speech signals from the compressed signals without algorithmic delay. Experimental results show that the proposed coding technique can transmit speech audio waveforms with 50% their original bit rate and the WaveNet-based speech coder remains effective for unknown speakers.