基于门控递归单元预测器模型的自适应差分脉冲编码调制语音解码器

IF 1.9 3区计算机科学 Q2 ACOUSTICS

Eurasip Journal on Audio Speech and Music Processing Pub Date : 2024-01-20 DOI:10.1186/s13636-023-00325-3

Gebremichael Kibret Sheferaw, Waweru Mwangi, Michael Kimwele, Adane Mamuye

{"title":"基于门控递归单元预测器模型的自适应差分脉冲编码调制语音解码器","authors":"Gebremichael Kibret Sheferaw, Waweru Mwangi, Michael Kimwele, Adane Mamuye","doi":"10.1186/s13636-023-00325-3","DOIUrl":null,"url":null,"abstract":"Speech coding is a method to reduce the amount of data needs to represent speech signals by exploiting the statistical properties of the speech signal. Recently, in the speech coding process, a neural network prediction model has gained attention as the reconstruction process of a nonlinear and nonstationary speech signal. This study proposes a novel approach to improve speech coding performance by using a gated recurrent unit (GRU)-based adaptive differential pulse code modulation (ADPCM) system. This GRU predictor model is trained using a data set of speech samples from the DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus actual sample and the ADPCM fixed-predictor output speech sample. Our contribution lies in the development of an algorithm for training the GRU predictive model that can improve its performance in speech coding prediction and a new offline trained predictive model for speech decoder. The results indicate that the proposed system significantly improves the accuracy of speech prediction, demonstrating its potential for speech prediction applications. Overall, this work presents a unique application of the GRU predictive model with ADPCM decoding in speech signal compression, providing a promising approach for future research in this field.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"85 1","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2024-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Gated recurrent unit predictor model-based adaptive differential pulse code modulation speech decoder\",\"authors\":\"Gebremichael Kibret Sheferaw, Waweru Mwangi, Michael Kimwele, Adane Mamuye\",\"doi\":\"10.1186/s13636-023-00325-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech coding is a method to reduce the amount of data needs to represent speech signals by exploiting the statistical properties of the speech signal. Recently, in the speech coding process, a neural network prediction model has gained attention as the reconstruction process of a nonlinear and nonstationary speech signal. This study proposes a novel approach to improve speech coding performance by using a gated recurrent unit (GRU)-based adaptive differential pulse code modulation (ADPCM) system. This GRU predictor model is trained using a data set of speech samples from the DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus actual sample and the ADPCM fixed-predictor output speech sample. Our contribution lies in the development of an algorithm for training the GRU predictive model that can improve its performance in speech coding prediction and a new offline trained predictive model for speech decoder. The results indicate that the proposed system significantly improves the accuracy of speech prediction, demonstrating its potential for speech prediction applications. Overall, this work presents a unique application of the GRU predictive model with ADPCM decoding in speech signal compression, providing a promising approach for future research in this field.\",\"PeriodicalId\":49202,\"journal\":{\"name\":\"Eurasip Journal on Audio Speech and Music Processing\",\"volume\":\"85 1\",\"pages\":\"\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2024-01-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Eurasip Journal on Audio Speech and Music Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1186/s13636-023-00325-3\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eurasip Journal on Audio Speech and Music Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s13636-023-00325-3","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

摘要

语音编码是一种通过利用语音信号的统计特性来减少表示语音信号所需数据量的方法。最近，在语音编码过程中，神经网络预测模型作为非线性和非稳态语音信号的重构过程受到关注。本研究提出了一种新方法，通过使用基于门控递归单元（GRU）的自适应差分脉冲编码调制（ADPCM）系统来提高语音编码性能。该 GRU 预测器模型是利用来自 DARPA TIMIT 声韵连续语音语料库实际样本和 ADPCM 固定预测器输出语音样本的语音样本数据集进行训练的。我们的贡献在于开发了一种用于训练 GRU 预测模型的算法，该算法可以提高 GRU 预测模型在语音编码预测中的性能，同时还为语音解码器开发了一种新的离线训练预测模型。结果表明，所提出的系统显著提高了语音预测的准确性，证明了其在语音预测应用方面的潜力。总之，这项研究提出了 GRU 预测模型与 ADPCM 解码在语音信号压缩中的独特应用，为该领域的未来研究提供了一种前景广阔的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Gated recurrent unit predictor model-based adaptive differential pulse code modulation speech decoder

Speech coding is a method to reduce the amount of data needs to represent speech signals by exploiting the statistical properties of the speech signal. Recently, in the speech coding process, a neural network prediction model has gained attention as the reconstruction process of a nonlinear and nonstationary speech signal. This study proposes a novel approach to improve speech coding performance by using a gated recurrent unit (GRU)-based adaptive differential pulse code modulation (ADPCM) system. This GRU predictor model is trained using a data set of speech samples from the DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus actual sample and the ADPCM fixed-predictor output speech sample. Our contribution lies in the development of an algorithm for training the GRU predictive model that can improve its performance in speech coding prediction and a new offline trained predictive model for speech decoder. The results indicate that the proposed system significantly improves the accuracy of speech prediction, demonstrating its potential for speech prediction applications. Overall, this work presents a unique application of the GRU predictive model with ADPCM decoding in speech signal compression, providing a promising approach for future research in this field.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Eurasip Journal on Audio Speech and Music Processing ACOUSTICS-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

4.10

自引率

4.20%

发文量

审稿时长

12 months

期刊介绍： The aim of “EURASIP Journal on Audio, Speech, and Music Processing” is to bring together researchers, scientists and engineers working on the theory and applications of the processing of various audio signals, with a specific focus on speech and music. EURASIP Journal on Audio, Speech, and Music Processing will be an interdisciplinary journal for the dissemination of all basic and applied aspects of speech communication and audio processes.