2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)最新文献

筛选
英文 中文
A novel approach to excitation coding in low-bit-rate high-quality CELP coders 一种低比特率高质量CELP编码器激励编码的新方法
V. Cuperman, A. Gersho, J. Linden, A. Rao, Tung-Chiang Yang, S. Ahmadi, R. Heidari, Fenghua Liu
{"title":"A novel approach to excitation coding in low-bit-rate high-quality CELP coders","authors":"V. Cuperman, A. Gersho, J. Linden, A. Rao, Tung-Chiang Yang, S. Ahmadi, R. Heidari, Fenghua Liu","doi":"10.1109/SCFT.2000.878378","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878378","url":null,"abstract":"A significant improvement in the efficiency of excitation coding with CELP at low bit rates is achieved by a new paradigm for encoding the fixed excitation. In the proposed scheme, the non-zero fixed-codebook excitation elements are substantially localized in a set of windows, with positions adaptive to the pitch peaks. Highly efficient coding is thus achieved by allocating most of the available excitation bits to capture the essential excitation events. The paradigm is validated by computer simulation of a variable-rate speech codec. The performance of the codec is evaluated by informal subjective tests and compared with TIA standard variable rate speech codecs. The results indicate that the proposed scheme can be used to reproduce speech at average bit rates from 2.3 to 3.4 kbps (i.e., in a two-way communication scenario) with very high quality and intelligibility.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114788044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Acoustic front-end processing for communication systems 通信系统声学前端处理
G. Elko
{"title":"Acoustic front-end processing for communication systems","authors":"G. Elko","doi":"10.1109/SCFT.2000.878424","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878424","url":null,"abstract":"Summary form only given. As communication systems have become more mobile and portable, we now have situations where audio communication in difficult acoustic environments is common. Speech coders at low bit rates tend to have problems with non-speech signals that are typically found in noisy acoustic environments. As a result, there can be degradation in the perceived audio quality for low-bit rate coders in noisy environments. One solution that has not been well studied, is the interaction of the acoustic front-end with speech coders in these difficult environments. Clearly an understanding of the end-to-end communication channel including the acoustic front-end will be necessary in order to optimize signal coding. Another area where acoustic front-end processing will play a large role in speech and audio coding is Internet protocol (IP) based communication networks. IP networks enable us to \"easily\" deliver wider bandwidth and multiple channels of audio for more realistic and transparent telecommunication. Sound transduction for hands-free teleconferencing has audio quality issues similar to the mobile communication problem. This article concentrates on some of the work that we have been doing at Bell Labs on the hands-free telecommunication problem with an emphasis on microphone arrays and acoustic echo cancellation. Finally, some new areas for possible collaboration between the speech coding and electroacoustics communities to improve mobile and hands-free communication are suggested.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126362609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting simultaneously masked linear prediction in a WI speech coder 在WI语音编码器中同时利用掩码线性预测
J. Lukasiak, I. Burnett
{"title":"Exploiting simultaneously masked linear prediction in a WI speech coder","authors":"J. Lukasiak, I. Burnett","doi":"10.1109/SCFT.2000.878377","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878377","url":null,"abstract":"This paper uses a method of incorporating simultaneous masking into the calculation of a linear predictive filter (SMLPC) as the front end to a 2 kbps waveform interpolation (WI) speech coder. A modification to the masking threshold calculation used in SMLPC is proposed. This modification improves the performance of SMLPC in noise like sections by placing greater emphasis on strongly voiced speech. MOS test results reveal that the modified SMLPC improved the perceptual quality of the WI coder. The improvement is significant for female speakers whilst the quality for male speech is virtually unchanged. This result conflicts with previous results reported for SMLPC where only male speech was improved. The change is attributed to the modification of the masking threshold and confirms that adapting the masking threshold according to the pitch of the speech will allow SMLPC to remove more perceptually important information from all input speech than standard LPC.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134250444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Signal processing for cochlear implants and low-rate speech coding 人工耳蜗的信号处理与低速率语音编码
P. Loizou
{"title":"Signal processing for cochlear implants and low-rate speech coding","authors":"P. Loizou","doi":"10.1109/SCFT.2000.878398","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878398","url":null,"abstract":"Summary form only given. Cochlear implants are now established as a new option for individuals with profound (sensorineural) hearing impairment. Many of the cochlear implant patients are able to understand speech without lip-reading, and some can communicate over the phone. The success of cochlear implants can be attributed to the combined efforts of scientists from various disciplines including bioengineering, physiology, and signal processing. Signal processing, in particular, played an important role in the development of various techniques for deriving electrical stimuli from the speech signal. Depending on the type of spectral information that was extracted from the acoustic signal, different speech processing strategies were developed over the years. The amount of spectral information that can be derived from the speech signal and delivered to the electrodes is limited, since the implant users have a small number (6-22) of electrodes. The designers of cochlear implants, much like the designers of speech coders, are therefore faced with the challenge of developing signal processing strategies that can extract a small, yet sufficient, amount of spectral information from the speech signal without compromising speech intelligibility and/or quality. Cochlear implants also provide us with a unique opportunity to study speech perception and investigate the perceptual limits of the auditory system. We can investigate, for example, the effect of limited spectral and intensity resolution on speech understanding, and ask questions such as \"What is the smallest number of channels or what is the smallest number of discriminable intensity steps needed to understand speech?\" The answers to such questions could potentially be used for the design of very low rate speech coders. This article provides an overview of various signal-processing techniques that have been used for cochlear prosthesis over the past 25 years, and also present some results from our intelligibility studies on the number of channels and quantization steps needed to understand speech.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114634315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Enhancing waveform interpolative coding with weighted REW parametric quantization 加权REW参数量化增强波形插值编码
O. Gottesman, A. Gersho
{"title":"Enhancing waveform interpolative coding with weighted REW parametric quantization","authors":"O. Gottesman, A. Gersho","doi":"10.1109/SCFT.2000.878391","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878391","url":null,"abstract":"This paper presents an efficient quantization technique for the rapidly-evolving waveforms in waveform interpolative (WI) coders. The scheme, based on a parametrization of the rapidly-evolving waveform (REW) magnitude, and analysis-by-synthesis (AbS) vector quantization (VQ) of the REW parameters, allows both higher temporal and spectral resolution of the REW. A perceptually weighted distortion measure takes advantage of spectral and temporal masking and leads to improved reconstructed speech quality, most notably in mixed voiced and unvoiced speech segments. The technique is an important component of the enhanced waveform interpolative (EWI) speech coder at 2.8 kbps that achieves a subjective quality slightly better than that of G.723.1 at 6.3 kbps.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124037992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Normalization and polygon error detection for split VQ of line spectral frequencies 线谱频率分裂VQ的归一化和多边形误差检测
A.M. Smith, J. P. Ashley, M. Jasiuk, Weiming Peng
{"title":"Normalization and polygon error detection for split VQ of line spectral frequencies","authors":"A.M. Smith, J. P. Ashley, M. Jasiuk, Weiming Peng","doi":"10.1109/SCFT.2000.878422","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878422","url":null,"abstract":"A technique for improving the performance of split vector quantization (VQ) of line spectral frequency (LSF) parameters is presented, wherein a normalization process is used to make all combinations of the codebook entries valid for preserving the LSF ordering property. An undesirable consequence of normalization is a reduction in the ability to detect bit errors in the LSF codebook indices by exploiting the ordering property. A new method, polygon error detection (PED), is presented which shows promise for overcoming this problem. The use of PED is also shown to be beneficial when normalization is not performed.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132976605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Efficient parameter quantisation for 2.4/1.2 kb/s split-band LPC coding 2.4/1.2 kb/s分带LPC编码的有效参数量化
S. Villette, Y. Cho, A. Kondoz
{"title":"Efficient parameter quantisation for 2.4/1.2 kb/s split-band LPC coding","authors":"S. Villette, Y. Cho, A. Kondoz","doi":"10.1109/SCFT.2000.878385","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878385","url":null,"abstract":"Speech coding at very low bit rates has many applications such as answering machines, IP telephony, mobile communications, military communications etc. Most low bit rate coders operate at around 2.4 kb/s, as the speech quality degrades too much below this bit rate. We describe a frequency domain speech coder capable of operating at both 2.9 and 1.2 kb/s, and produces good quality synthesised speech. Both rates use the same analysis and synthesis building blocks over 20 ms, but the 1.2 kb/s coder jointly quantises three sets of parameters every 60 ms to reduce the bit rate while maintaining speech quality. We also describe the quantisation methods used to lower the bit rate from 2.4 kb/s to 1.2 kb/s while retaining most of the quality of the higher bit rate version.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116084117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Voicing detection in DAP-STC DAP-STC中的语音检测
M. S. Ho, D. J. Molyneux, B. Cheetham
{"title":"Voicing detection in DAP-STC","authors":"M. S. Ho, D. J. Molyneux, B. Cheetham","doi":"10.1109/SCFT.2000.878386","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878386","url":null,"abstract":"Sinusoidal transform coding (STC) requires an all-pole representation of spectra derived periodically from the short-term speech spectral envelope and a \"voicing probability\" frequency f/sub v/ to divide each spectrum into two sub-bands: voiced below f/sub v/ and unvoiced above f/sub v/. Discrete all-pole (DAP) modeling may be applied to STC to improve the accuracy of the short-term spectral envelope for voiced speech with modifications to accommodate unvoiced speech and spectra which do not conform well to an all-pole model. This paper presents a novel approach to the determination of f/sub v/ which is appropriate when DAP is employed. It is a frequency-domain algorithm with an analysis-by-synthesis optimisation process. This approach improves the accuracy of DAP-STC modeled speech.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127644347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An information theoretic perspective on the speech spectrum process 语音频谱处理的信息论视角
F. Norden, T. Eriksson, P. Hedelin
{"title":"An information theoretic perspective on the speech spectrum process","authors":"F. Norden, T. Eriksson, P. Hedelin","doi":"10.1109/SCFT.2000.878409","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878409","url":null,"abstract":"In this paper, an information theoretic study of properties of the speech spectrum process is performed. Various techniques to model the probability density function are applied to the spectrum source to compute rate-distortion functions. We estimate the difference in the required rate to achieve a given distortion for three different scenarios: interframe gain exploitation, low-pass filtering of LPC vectors and increased speech signal bandwidth. We obtain fairly consistent results for the different methods of calculating rate-distortion functions. The results show that for close to transparent LPC quantization we gain 4-6 bits per frame by exploiting first order interframe correlation. The new idea of using low-pass filtered LPC vectors has shown to decrease the coding cost with 1-3 bits per frame, depending on the cutoff frequency.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"182 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132669370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Dichotic presentation of interleaving critical-band envelopes: an application to multi-descriptive coding 交叉临界频带包络的二分法表示:在多描述编码中的应用
O. Ghitza, P. Kroon
{"title":"Dichotic presentation of interleaving critical-band envelopes: an application to multi-descriptive coding","authors":"O. Ghitza, P. Kroon","doi":"10.1109/SCFT.2000.878400","DOIUrl":"https://doi.org/10.1109/SCFT.2000.878400","url":null,"abstract":"A coding paradigm is proposed which is based solely on the properties of the human auditory system and does not assume any specific source properties. Hence, its performance is equally good for speech, noisy speech, and music signals. The signal decomposition in the proposed paradigm takes advantage of binaural properties of the human auditory system. This also leads to a natural multi-descriptive signal representation.","PeriodicalId":359453,"journal":{"name":"2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134314926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信