{"title":"Recursive Randomized Tree Coding of Speech","authors":"Hoontaek Oh, J. Gibson","doi":"10.1109/MIPR54900.2022.00020","DOIUrl":null,"url":null,"abstract":"We study a recursively adaptive architecture for speech coding based on the concept of tree coding combined with recursive least squares lattice estimation of the autoregressive component and gradient based estimation of the moving average part of the short term prediction and gradient/autocorrelation based long term prediction algorithms, all adapting to minimize the perceptually weighted reconstruction error. The new idea of concatenated, randomized multitrees is introduced and explored. Voice activity detection (VAD) and comfort noise generation (CNG) are included to reduce the bit rate and the number of computations required. Performance is compared to the widely implemented and utilized AMR codec and we demonstrate comparable performance at bit rates of 4.5 to 7.5 kbits/s.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MIPR54900.2022.00020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We study a recursively adaptive architecture for speech coding based on the concept of tree coding combined with recursive least squares lattice estimation of the autoregressive component and gradient based estimation of the moving average part of the short term prediction and gradient/autocorrelation based long term prediction algorithms, all adapting to minimize the perceptually weighted reconstruction error. The new idea of concatenated, randomized multitrees is introduced and explored. Voice activity detection (VAD) and comfort noise generation (CNG) are included to reduce the bit rate and the number of computations required. Performance is compared to the widely implemented and utilized AMR codec and we demonstrate comparable performance at bit rates of 4.5 to 7.5 kbits/s.