Heng-Iang Hsu, Wen-Whei Chang, Xiaobei Liu, S. Koh
{"title":"MMSE decoding for vector quantization over channels with memory","authors":"Heng-Iang Hsu, Wen-Whei Chang, Xiaobei Liu, S. Koh","doi":"10.1109/SCW.2002.1215728","DOIUrl":"https://doi.org/10.1109/SCW.2002.1215728","url":null,"abstract":"The paper presents memory-enhanced extensions of minimum mean-squared error (MMSE) decoding for vector quantization over noisy channels. We also develop a recursive algorithm for computing the transition probabilities of the Gilbert channel, and illustrate its performance in vector quantization of Gauss-Markov sources under noisy channel conditions. Simulation results indicate that the proposed algorithm enables the implementation of an MMSE decoder with increased robustness to channel errors.","PeriodicalId":140750,"journal":{"name":"Speech Coding, 2002, IEEE Workshop Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131395131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An iterative interpolative transform method for modeling harmonic magnitudes","authors":"T. Ramabadran, A. Smith, M. Jasiuk","doi":"10.1109/SCW.2002.1215716","DOIUrl":"https://doi.org/10.1109/SCW.2002.1215716","url":null,"abstract":"In this paper, we describe a method for modeling speech harmonic magnitudes, the accurate representation of which is essential for high quality speech synthesis in several parametric vocoders. The given set of harmonic magnitudes is interpolated and transformed into the auto-correlation domain before an all-pole model is derived. Through an iterative procedure, the interpolation curve used in the frequency domain is improved. This new iterative, interpolative, transform (IIT) method has been found to model the harmonic magnitudes more accurately than earlier methods when measured in terms of log-spectral distortion.","PeriodicalId":140750,"journal":{"name":"Speech Coding, 2002, IEEE Workshop Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123062564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Perceptual QoS assessment methodologies for coded speech in networks","authors":"N. Kitawaki","doi":"10.1109/SCW.2002.1215730","DOIUrl":"https://doi.org/10.1109/SCW.2002.1215730","url":null,"abstract":"The paper reviews perceptual QoS assessment methodologies for coded speech in networks. The methods are mainly based on my contributions to the ITU-T Study Group 12 since 1981. First, quality factors in communications networks are analyzed, and then appropriate assessment methods for coded speech are discussed. Finally, the current status for perceptual QoS measurement methodologies is described from the viewpoint of a network planning tool for compound quality factors in communications networks.","PeriodicalId":140750,"journal":{"name":"Speech Coding, 2002, IEEE Workshop Proceedings.","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124931303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A packet loss concealment method using pitch waveform repetition and internal state update on the decoded speech for the sub-band ADPCM wideband speech codec","authors":"M. Serizawa, Y. Nozawa","doi":"10.1109/SCW.2002.1215726","DOIUrl":"https://doi.org/10.1109/SCW.2002.1215726","url":null,"abstract":"The paper proposes a packet loss concealment (PLC) method for the SB-ADPCM (sub-band adaptive differential pulse code modulation) wideband speech codec. When a packet loss occurs, the concealment repeats a pitch waveform of the speech decoded in the past with attenuation to generate a speech waveform corresponding to the lost packet. The packet loss causes differences in the internal states, such as prediction filter states, between encoding and decoding of the SB-ADPCM codec. This difference results in an annoying click noise during the period following the packet loss. The proposed method reduces this difference by updating the internal state based on the speech decoded by the concealment in the past. It also employs a forgetting factor control for the internal states, which reduces the impact on the internal states from the packet loss. Results from a five-grade mean opinion test show that the proposed method achieves around 3 (fair) or 4 (good) speech quality at a loss rate lower than 5%, and 0.4 through 1.0 higher quality compared to the conventional muting PLC method at packet loss rates of 1 to 10% with a packet size of 10 or 20 msec.","PeriodicalId":140750,"journal":{"name":"Speech Coding, 2002, IEEE Workshop Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115571987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The analysis of speech codecs using psychoacoustic measures","authors":"Mohammed Raad, C. Ritz, I. Burnett, A. Mertins","doi":"10.1109/SCW.2002.1215740","DOIUrl":"https://doi.org/10.1109/SCW.2002.1215740","url":null,"abstract":"This paper analyses two narrowband speech codecs, the 4.8 kbit/s FS1016 coder and the 8 kbit/s G729 coder, using objective psychoacoustic measures. Four measures are used: loudness, sharpness, roughness and tonality. The results show sharpness and roughness as the two major contributing factors to the subjective difference between the two coders.","PeriodicalId":140750,"journal":{"name":"Speech Coding, 2002, IEEE Workshop Proceedings.","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129459525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Wideband speech coder employing T-codes and reversible variable length codes","authors":"Hongqiang Wang, S. Koh, G. Shu","doi":"10.1109/SCW.2002.1215743","DOIUrl":"https://doi.org/10.1109/SCW.2002.1215743","url":null,"abstract":"The performance of speech coders, such as the ITU-T G.722.1 wideband speech coder, that employ nonself-synchronizing variable length codes is greatly affected when the received bit stream is in error. This paper studies the use of T-codes and reversible variable length codes (RVLC) to replace the Huffman codes recommended in the G.722.1 coder in order to improve its robustness when bit errors occur. Preliminary simulation results show significant improvement in coder performance with the proposed schemes.","PeriodicalId":140750,"journal":{"name":"Speech Coding, 2002, IEEE Workshop Proceedings.","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121163151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speech and noise separations using comb filtering method for high quality speech coding","authors":"Y. Wang, K. Yoshida","doi":"10.1109/SCW.2002.1215739","DOIUrl":"https://doi.org/10.1109/SCW.2002.1215739","url":null,"abstract":"This paper presents speech and a noise separation methods for achieving high quality of speech coding. In speech separation, a pitch harmonics restoration method is proposed. This method can effectively suppress the so-called musical noise and reduce speech distortion. In noise separation, the noise-base estimated in the speech separation process is used as a separated background noise and a method for encoding the noise with low bit rates is proposed. The proposed methods used as a preprocessor of the adaptive multi-rate wideband (AMR-WB) coder are evaluated by the degradation category rating (DCR) test. An average of 0.3-point improvement in performance under the noise conditions is achieved compared with the conventional method without using speech and noise separations.","PeriodicalId":140750,"journal":{"name":"Speech Coding, 2002, IEEE Workshop Proceedings.","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127941579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantization noise spectral shaping in instantaneous coding of spectrally unbalanced speech signals","authors":"G. Mahé, A. Gilloire","doi":"10.1109/SCW.2002.1215722","DOIUrl":"https://doi.org/10.1109/SCW.2002.1215722","url":null,"abstract":"In the context of centralized spectral equalization of speech in a telephone network, the signal is spectrally strongly unbalanced at the output of the equalizer, before being quantized, which results in low SNR at the receiver. We propose and evaluate experimentally two methods to reshape the quantization noise, in order to make it less perceptible in reception. The first one consists in finding the most probable quantization sequence, given the desired noise spectrum. In the second one, the filtered quantization error is added to the signal to be quantized.","PeriodicalId":140750,"journal":{"name":"Speech Coding, 2002, IEEE Workshop Proceedings.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125533580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A scalable coder designed for 10-kHz bandwidth speech","authors":"M. Oshikiri, H. Ehara, K. Yoshida","doi":"10.1109/SCW.2002.1215741","DOIUrl":"https://doi.org/10.1109/SCW.2002.1215741","url":null,"abstract":"This paper presents a scalable speech coder with rate of 23.85-kbit/s to encode 10-kHz bandwidth speech signals. The perceptual quality of the 10-kHz bandwidth speech signals is much better than that of 7-kHz bandwidth ones, and it is close to that of 20-kHz bandwidth ones. The 10-kHz bandwidth is therefore promising for high-fidelity conversational applications. The scalable coder consists of two layers: a base-layer and an enhancement-layer. The adaptive multi-rate wideband speech coder (AMR-WB) at 15.85-kbit/s and a transform coding method at 8-kbit/s are utilized for the base-layer and the enhancement-layer, respectively. This hybrid structure ensures the efficient coding of the 10-kHz bandwidth speech. In enhancement-layer, the modified discrete cosine transform (MDCT) is exploited. Its analysis frame size is set to be short in order to minimize additional algorithmic delay. The total additional algorithmic delay of the enhancement-layer is 5-ms. Since it is difficult to quantize all the MDCT coefficients at 8-kbit/s, we have limited the region for quantization from 6-kHz to 9-kHz to improve the perceptual quality of decoded speech. Our subjective evaluation test results indicate the quality of the proposed coder clearly exceeds that of AMR-WB at 23.85-kbit/s under both clean and noise conditions.","PeriodicalId":140750,"journal":{"name":"Speech Coding, 2002, IEEE Workshop Proceedings.","volume":"229 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121478822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tian Wang, K. Koishida, V. Cuperman, A. Gersho, J. Collura
{"title":"A 1200/2400 bps coding suite based on MELP","authors":"Tian Wang, K. Koishida, V. Cuperman, A. Gersho, J. Collura","doi":"10.1109/SCW.2002.1215734","DOIUrl":"https://doi.org/10.1109/SCW.2002.1215734","url":null,"abstract":"This paper presents key algorithm features of the future NATO narrow band voice coder (NBVC), a 1.2/2.4 kbps speech coder with noise preprocessor based on the MELP analysis algorithm. At 1.2 kbps, the MELP parameters for three consecutive frames are grouped into a superframe and jointly quantized to obtain high coding efficiency. The inter-frame redundancy is exploited with distinct quantization schemes for different unvoiced/voiced (U/V) frame combinations in the superframe. Novel techniques used at 1.2 kbps include pitch vector quantization using pitch differentials, joint quantization of pitch and U/V decisions and LSF quantization with a forward-backward interpolation method. A new harmonic synthesizer is introduced for both rates which improves the reproduction quality. Subjective test results indicate that the 1.2 kbps speech coder achieves quality close to the existing federal standard 2.4 kbps MELP coder.","PeriodicalId":140750,"journal":{"name":"Speech Coding, 2002, IEEE Workshop Proceedings.","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131114522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}