2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)最新文献

筛选
英文 中文
PDF optimized parametric vector quantization of speech line spectral frequencies PDF优化的语音线谱频率参数矢量量化
A. D. Subramaniam, B. Rao
{"title":"PDF optimized parametric vector quantization of speech line spectral frequencies","authors":"A. D. Subramaniam, B. Rao","doi":"10.1109/SCFT.2000.878407","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878407","url":null,"abstract":"A computationally efficient, high quality, vector quantization scheme based on a parametric probability density function (PDF) is developed for encoding speech line spectral frequencies (LSF). For this purpose, speech LSFs are modeled as i.i.d realizations of a multivariate normal mixture density. The mixture model parameters are efficiently estimated from the training data using the expectation maximization (EM) algorithm. The estimated density is suitably quantized using transform coding and bit-allocation techniques for both fixed rate and variable rate systems. Source encoding using the resultant codebook involves no searches and its computational complexity is minimal and independent of the rate of the system. Experimental results show that the proposed scheme provides 2-3 bits gain over conventional MSVQ schemes. The proposed memoryless quantizer is enhanced to form a quantizer with memory. The quantizer with memory provides transparent quality speech at 20 bits/frame.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114154014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 148
Combined speech and audio coding by discrimination 结合语音和音频的判别编码
L. Tancerel, S. Ragot, V. Ruoppila, R. Lefebvre
{"title":"Combined speech and audio coding by discrimination","authors":"L. Tancerel, S. Ragot, V. Ruoppila, R. Lefebvre","doi":"10.1109/SCFT.2000.878435","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878435","url":null,"abstract":"We propose in this paper a general solution for combined speech and audio coding. Particularly, we describe a speech/music discrimination procedure for multi-mode wideband coding. The speech/music decision is updated only when a low-energy frame is detected, and kept unchanged otherwise. The signal is classified using second-order statistics of discriminant parameters. An experimental CELP/transform coder operating at 16 kbit/s is demonstrated. Results show improved performance when compared to single-mode encoding.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127666154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Design of an MPEG-4 general audio coder for improving speech quality 为提高语音质量而设计的MPEG-4通用音频编码器
T. Moriya, A. Jin, N. Iwakami, T. Mori
{"title":"Design of an MPEG-4 general audio coder for improving speech quality","authors":"T. Moriya, A. Jin, N. Iwakami, T. Mori","doi":"10.1109/SCFT.2000.878429","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878429","url":null,"abstract":"This paper proposes a design for an ISO/IEC MPEG-4 general audio encoder to improve the speech quality at low bit rates. The main contributions to the improvement are using i) a higher sampling rate to get higher time resolution for a given frame length and ii) adaptive preprocessing to reduce the bandwidth. Listening tests at 8 and 16 kbit/s showed that compared with a conventional audio coder and a speech-specific coder, the proposed coder provided better speech quality than the conventional audio coder (MP3) while keeping the quality for music. For speech signals, however, the speech-specific coder (MPEG-4 CELP) produced significantly better quality than the audio coders. The proposed design will be especially useful for low-bit-rate audio-visual delivery applications which may include both speech and music signals.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130396466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Robust signal/noise discrimination for wideband speech and audio coding 鲁棒的信号/噪声识别宽带语音和音频编码
M. Jelinek, F. Labonte
{"title":"Robust signal/noise discrimination for wideband speech and audio coding","authors":"M. Jelinek, F. Labonte","doi":"10.1109/SCFT.2000.878434","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878434","url":null,"abstract":"We present a robust discrimination method to separate information carrying signals from ambient noise for wideband coding applications. The method consists of a two-stage procedure. First, a local decision is made based on a set of extracted parameters to update the estimated noise level. The parameters have been chosen to reliably detect speech as well as music signals. Then, the final decision is made based only on a frequency dependent signal to noise ratio (SNR). The noise level update does not depend on the final decision to prevent the discriminator from locking when noise level changes suddenly. The performance is compared with the performance of the voice activity detector (VAD) of G.729, Annex G.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"515 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123073208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Bandwidth extension of narrowband speech for low bit-rate wideband coding 用于低比特率宽带编码的窄带语音带宽扩展
J. Valin, R. Lefebvre
{"title":"Bandwidth extension of narrowband speech for low bit-rate wideband coding","authors":"J. Valin, R. Lefebvre","doi":"10.1109/SCFT.2000.878425","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878425","url":null,"abstract":"Wireless telephone speech is usually limited to the 300-3400 Hz band, which reduces its quality. There is thus a growing demand for wideband speech systems that transmit from 50 Hz to 8000 Hz. This paper presents an algorithm to generate wideband speech from narrowband speech using as low as 500 bit/s of side information. The 50-300 Hz band is predicted from the narrowband signal. A source-excitation model is used for the 3400-8000 Hz band, where the excitation is extrapolated at the receiver, and the spectral envelope is transmitted. Though some artifacts are present, the resulting wideband speech has enhanced quality compared to narrowband speech.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129141372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
An efficient synthesis method for sinusoidal vocoders 一种有效的正弦声码器合成方法
T. Ramabadran, M. McLaughlin
{"title":"An efficient synthesis method for sinusoidal vocoders","authors":"T. Ramabadran, M. McLaughlin","doi":"10.1109/SCFT.2000.878389","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878389","url":null,"abstract":"An FFT-based technique for synthesizing a \"sum-of-sinusoids\" signal is described. The technique is useful for efficient speech synthesis in sinusoidal-type vocoders. Each sinusoid to be synthesized is represented by a small number of FFT coefficients around the frequency of interest. Linear amplitude modulation of the sinusoid is approximated by convolution with a three-point sequence. Phase information is incorporated by multiplication with a phase constant. The FFT coefficients of each sinusoid, processed as above, are added together and transformed by an IFFT. From the resulting time-domain signal, an appropriate section is extracted to obtain the desired \"sum-of-sinusoids\" signal. When the number of sinusoids is large, the proposed technique can provide an almost 45% reduction in complexity.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129133380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predictive quantization of spectral amplitudes for harmonic coders 谐波编码器频谱幅度的预测量化
R. Prased, W. Chan
{"title":"Predictive quantization of spectral amplitudes for harmonic coders","authors":"R. Prased, W. Chan","doi":"10.1109/SCFT.2000.878390","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878390","url":null,"abstract":"We present a novel predictive vector quantization scheme for coding the variable-dimension spectral amplitude vectors produced by harmonic coders. The scheme has a safety-net prediction structure, but it uses analysis-by-synthesis codebook search. A \"closed-loop\" codebook design algorithm is devised. Significant improvement over conventional predictive VQ design methods is obtained.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117241055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wideband extension of telephone speech using a hidden Markov model 使用隐马尔可夫模型的电话语音宽带扩展
P. Jax, P. Vary
{"title":"Wideband extension of telephone speech using a hidden Markov model","authors":"P. Jax, P. Vary","doi":"10.1109/SCFT.2000.878427","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878427","url":null,"abstract":"In this paper we propose an algorithm to recover wideband speech from lowpass-bandlimited speech. The narrowband input signal is classified into a limited number of speech sounds for which the information about the wideband spectral envelope is taken from a pre-trained codebook. For the codebook search algorithm a statistical approach based on a hidden Markov model is used, which takes different features of the bandlimited speech into account, and minimizes a mean squared error criterion. The new algorithm needs only one single wideband codebook and inherently guarantees the transparency of the system in the base-band. The enhanced speech exhibits a significantly larger bandwidth than the input speech without introducing objectionable artifacts.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117140566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 66
A novel algorithm for low bit rate speech compression using a hybrid LP-harmonics model 基于混合lp -谐波模型的低比特率语音压缩新算法
N. Abu-Shikhah, Mohamed Deriche
{"title":"A novel algorithm for low bit rate speech compression using a hybrid LP-harmonics model","authors":"N. Abu-Shikhah, Mohamed Deriche","doi":"10.1109/SCFT.2000.878388","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878388","url":null,"abstract":"We present a new LP-harmonic speech codec. At the coder speech signal is pre-processed, and an LP analysis is performed, together with pitch estimation and voicing decision. At the decoder and when the frame is voiced, the encoded parameters are used to estimate the spectrum envelope, extract and classify the harmonics as either strong or weak depending on their relative distance from multiples of the fundamental frequency. Strong harmonics parameters are then used to generate pure sinusoids. While weak harmonics are used to generate a mixed signal of a pure sinusoid and a random-like signal. For unvoiced frames, the excitation of the LP filter is generated as a white noise signal. The proposed model allows for the mixing of strong and weak periodic signals together with random signals to produce an excitation input that results in natural speech. Informal testing of the coder working at 1.82 kb/s showed that the output speech has high intelligibility, with quality comparable to that of a 4 kb/s sinusoidal codec.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114688462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Partial-energy weighted interpolation of linear prediction coefficients 线性预测系数的部分能量加权插值
T. Islam, P. Kabal
{"title":"Partial-energy weighted interpolation of linear prediction coefficients","authors":"T. Islam, P. Kabal","doi":"10.1109/SCFT.2000.878414","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878414","url":null,"abstract":"This paper discusses the interpolation of linear prediction (LP) coefficients. The performance of LP analysis using different numbers of subframes and the choice of representation for the LP coefficients are studied. Interpolation is done by converting the LP coefficients in one of the following representations: line spectral frequencies, reflection coefficients, log area ratios, and autocorrelations. It is shown that good performance is obtained for line spectral frequencies and five subframes per frame. A new interpolation technique which incorporates partial frame energy is introduced. This technique generalizes the concept of energy weighting to different LP coefficient representations.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123527380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信