{"title":"Analysis-by-synthesis voicing cut-off determination in harmonic coding","authors":"Wenhui Jia, W. Chan","doi":"10.1109/SCFT.2000.878397","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878397","url":null,"abstract":"In low bit-rate harmonic speech coding, voicing information is often specified by a cut-off frequency of the spectrum. Many approaches of cut-off estimation depend on spectral matching, where a fixed prototype spectrum is used to model voiced harmonics. However, voiced harmonics do not always show a regular shape. One of the causes is harmonic interference. We propose an analysis-by-synthesis voicing cut-off determination scheme that takes into account harmonic interactions in spectral matching. The proposed scheme has been embedded in a 2.4 kb/s harmonic coder. Subjective listening tests show that the scheme performs well and is robust against noise.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129941456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Markov chain prediction for missing speech frame compensation","authors":"M. A. Kohler, R. Yarlagadda","doi":"10.1109/SCFT.2000.878402","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878402","url":null,"abstract":"Transmitting voice over packet-switched networks, such as the Internet, is an appealing communication alternative to the traditional wireline system. The ability to lower the cost of long-distance telephone calls and provide additional capabilities is attracting customers worldwide to this tool. However, many current packet-switched protocols cannot guarantee real-time delivery of packets. When voice packets are lost, deleted, or excessively delayed in the network, the receiver must provide something for the listener to hear. This paper describes Markov chain prediction, a technique for compensating when speech frames are missing. It outperforms venerable frame repetition using both subjective and objective measurements.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127802999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Model based spectrum prediction","authors":"J. Lindblom, J. Samuelsson, Per Hedelin","doi":"10.1109/SCFT.2000.878419","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878419","url":null,"abstract":"This paper presents methods for speech spectrum prediction based on Gaussian mixture models. Spectrum prediction may be useful in a packet transmission system where the sensitivity to packet losses is a major problem. Models of speech are trained by the expectation maximization algorithm using pairs, triples etc. of consecutive cepstral vectors. The models are used to design first, second etc. order predictors. The prediction schemes are evaluated using the spectral distortion criterion and compared to a simple reference method. The best prediction scheme obtains an average spectral distortion that is 0.46 dB less than for the reference method.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128328853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved signal analysis and time-synchronous reconstruction in waveform interpolation coding","authors":"N. Chong-White, Ian Burnett","doi":"10.1109/SCFT.2000.878394","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878394","url":null,"abstract":"This paper presents a waveform-matched waveform interpolation (WMWI) technique which enables improved speech analysis over existing WI coders. In WMWI, an accurate representation of speech evolution is produced by extracting critically-sampled pitch periods of a time-warped, constant pitch residual. The technique also offers waveform-matching capabilities by using an inverse warping process to near-perfectly reconstruct the residual. Here, a pitch track optimisation technique is described which ensures the speech residual can be effectively decomposed and quantised. Also, the pitch parameters required to efficiently quantise and recreate the pitch track, on a period-by-period basis, are identified. This allows time-synchrony between the original and decoded signals to be preserved.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134171063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Results on reverse water-filling, SNR, and log-spectral error in codebook-based coding","authors":"S. Voran","doi":"10.1109/SCFT.2000.878387","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878387","url":null,"abstract":"This paper identifies optimum levels of reverse water-filling for codebook-based coding of noise and speech signals. We find that there is little to be gained from optimizing an effective rate parameter. We identify trade-offs between SNR and log-spectral error. We show that the use of a gain factor compares favorably with reverse water-filling in some situations.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"445 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133246903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Changes in voice quality judgments as a function of background noise level in the listening environment","authors":"L. Thorpe, R. Rabipour","doi":"10.1109/SCFT.2000.878382","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878382","url":null,"abstract":"This study explores the extent to which differences in voice quality with different bit rates become less perceptible when users are listening in a noisy environment. The individual rate modes of two multi-rate codecs were rated by listeners in various background noise conditions, including a quiet baseline, crowd babble, street noise, factory noise, and two levels of car noise. The results suggest that in some cases a lower bit-rate codec can be substituted without an associated drop in perceived quality when the listener is in a noisy location. Based on this effect, it would be possible to increase the system capacity or allow graceful handling of network overload by reducing transmission bandwidth allocated to receivers in high background noise without associated reduction in perceived voice quality.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116080017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tim Fingscheidt, T. Hindelang, V. Richard, Nambi Seshadri
{"title":"On quantizer dimensions in joint speech/channel coding","authors":"Tim Fingscheidt, T. Hindelang, V. Richard, Nambi Seshadri","doi":"10.1109/SCFT.2000.878404","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878404","url":null,"abstract":"In mobile speech communication usually vector quantization (VQ) is employed to ensure high coding efficiency. Convolutional coding and softbit speech decoding can add a considerable amount of robustness. VQ as well as softbit decoding however can be a quite complex task. Under the constraint of a constant gross bit rate and clean channel quality we propose the use of lower dimensional VQ or even scalar quantization (SQ) with a higher bit rate which leaves then fewer redundancy to be added by channel coding. This concept of joint speech/channel coding with its suboptimal speech coder and weaker channel coder can efficiently employ softbit speech decoding yielding a low overall complexity transmission scheme. Cases are shown where its performance is even better as compared to a high dimensional VQ with softbit decoding.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127002961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application of multidimensional scaling to subjective evaluation of coded speech","authors":"J. L. Hall","doi":"10.1109/SCFT.2000.878380","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878380","url":null,"abstract":"We propose a new procedure for subjective evaluation of coded speech. This procedure has the potential of providing an anchorable measure of quality that contains more information than the single number provided by MOS testing. A stimulus space and the relationship between this space and speech quality are established with multidimensional scaling techniques in a large-scale listening test. In the field, the user uses a method described in this report to position a stimulus under evaluation in this previously-established space, and from this position the user draws conclusions about speech quality. The stimulus space is created by the multidimensional scaling program INDSCAL, which operates on subjective judgments of dissimilarities between samples of speech to create a stimulus space in which distances between stimuli correspond to perceptual dissimilarities. The stimulus space has the additional property that its dimensions correspond to perceptual attributes of the stimuli. In a pilot experiment, stimulus spaces for utterances produced by a male and a female talker were found to be highly correlated. MOS scores obtained in a separate study were found to be highly correlated with position in the stimulus space. We discuss both the physical and perceptual correlates of the three dimensions.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128499370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"4 kb/s improved multi-pulse based CELP speech coding with multiple location codebook and post-processing","authors":"K. Ozawa","doi":"10.1109/SCFT.2000.878379","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878379","url":null,"abstract":"This paper proposes an improved MP-CELP (Multi-Pulse-based CELP) speech coding at 4 kb/s. In MP-CELP, amplitudes or signs of multi-pulse excitation are simultaneonsly vector quantized (VQ). In order to improve speech quality for voiced speech, a multiple pulse location codebook is stored to enhance the coverage of the location. The optimum combination among the pulse location codebook, pulse amplitude codevector and gain codevector is searched for and selected. In order to be robust against background noise, a post-processing efficiently reduces temporal fluctuation for the excitation signal. The subjective evaluation results show that speech quality for 4 kb/s improved MP-CELP is equivalent to that for ITU-T G.726 (32 kb/s) and G.729 (8 kb/s) for both M-IRS and flat clean speech. For background noise conditions, 4 kb/s speech quality is close to that for ITU-T G.726 (32 kb/s).","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131217499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Very low rate speech coding using temporal decomposition and waveform interpolation","authors":"C. Ritz, I. Burnett, J. Lukasiak","doi":"10.1109/SCFT.2000.878384","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878384","url":null,"abstract":"In very low rate coding the aim is to accurately represent speech characteristics as efficiently as possible. High coding gains for the spectral features can be achieved through the use of temporal decomposition. Waveform interpolation coders accurately represent the excitation using characteristic waveforms (CWs) extracted at a constant rate. In this paper, the two approaches are combined into a very low rate coder operating at around 1 kbps. It is shown that the evolution of the excitation is related to the evolution of the speech spectrum. To minimise bit rates, the transmission of CWs is adapted to the spectral parameter evolution using the parameters derived from temporal decomposition of the spectral parameters.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123874742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}