Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing最新文献

筛选
英文 中文
IPA: improved phone modelling with recurrent neural networks IPA:改进的手机模型与循环神经网络
T. Robinson, M. Hochberg, S. Renals
{"title":"IPA: improved phone modelling with recurrent neural networks","authors":"T. Robinson, M. Hochberg, S. Renals","doi":"10.1109/ICASSP.1994.389361","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389361","url":null,"abstract":"This paper describes phone modelling improvements to the hybrid connectionist-hidden Markov model speech recognition system developed at Cambridge University. These improvements are applied to phone recognition from the TIMIT task and word recognition from the Wall Street Journal (WSJ) task. A recurrent net is used to map acoustic vectors to posterior probabilities of phone classes. The maximum likelihood phone or word string is then extracted using Markov models. The paper describes three improvements: connectionist model merging; explicit presentation of acoustic context; and improved duration modelling. The first is shown to provide a significant improvement in the TIMIT phone recognition rate and all three provide an improvement in the WSJ word recognition rate.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121597920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Optimal entropy constrained scalar quantization for exponential and Laplacian random variables 指数和拉普拉斯随机变量的最优熵约束标量量化
G. Sullivan
{"title":"Optimal entropy constrained scalar quantization for exponential and Laplacian random variables","authors":"G. Sullivan","doi":"10.1109/ICASSP.1994.389481","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389481","url":null,"abstract":"This paper presents solutions to the entropy-constrained scalar quantizer (ECSQ) design problem for two sources commonly encountered in image and speech compression applications: sources having exponential and Laplacian probability density functions. We obtain the optimal ECSQ either with or without an additional constraint on the number of levels in the quantizer. In contrast to prior methods, which require iterative solution of a large number of nonlinear equations, the new method needs only a single sequence of solutions to one-dimensional nonlinear equations (in some Laplacian cases, one additional two-dimensional solution is needed). As a result, the new method is orders of magnitude faster than prior ones. We also show that as the constraint on the number of levels in the quantizer is relaxed, the optimal ECSQ becomes a uniform threshold quantizer (UTQ) for exponential, but not for Laplacian sources.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131995205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Spectral quantization of cepstral coefficients 倒谱系数的谱量化
R. Hagen
{"title":"Spectral quantization of cepstral coefficients","authors":"R. Hagen","doi":"10.1109/ICASSP.1994.389244","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389244","url":null,"abstract":"Studies the cepstral coefficients as a suitable representation of the linear prediction filter for spectral coding purposes. Spectral coding methods in predictive speech coders are usually evaluated using the spectral distance measure. The average spectral distance combined with a measure of the percentage of spectra with high distortion are used to predict the perceptual quality when quantizing the prediction filter. The authors show that the spectral distance is equivalent to a squared error in the cepstral domain. Methods for spectral quantization using vector quantization of cepstral coefficients are analyzed. Better results than for quantization of line spectrum frequencies are reported for both single-stage VQ at 11-14 bits as well as 2-stage VQ at 18-22 bits. It is concluded that the cepstral coefficients are the right representation for LPC spectral coding purposes.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132060848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Segmentation of speech using speaker identification 使用说话人识别的语音分割
L. Wilcox, Francine R. Chen, Don Kimber, V. Balasubramanian
{"title":"Segmentation of speech using speaker identification","authors":"L. Wilcox, Francine R. Chen, Don Kimber, V. Balasubramanian","doi":"10.1109/ICASSP.1994.389330","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389330","url":null,"abstract":"This paper describes techniques for segmentation of conversational speech based on speaker identity. Speaker segmentation is performed using Viterbi decoding on a hidden Markov model network consisting of interconnected speaker sub-networks. Speaker sub-networks are initialized using Baum-Welch training on data labeled by speaker, and are iteratively retrained based on the previous segmentation. If data labeled by speaker is not available, agglomerative clustering is used to approximately segment the conversational speech according to speaker prior to Baum-Welch training. The distance measure for the clustering is a likelihood ratio in which speakers are modeled by Gaussian distributions. The distance between merged segments is recomputed at each stage of the clustering, and a duration model is used to bias the likelihood ratio. Segmentation accuracy using agglomerative clustering initialization matches accuracy using initialization with speaker labeled data.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132328585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 96
A novel tree-structured video coder 一种新颖的树状结构视频编码器
F. D. Natale, G. Desoli, D. Giusto
{"title":"A novel tree-structured video coder","authors":"F. D. Natale, G. Desoli, D. Giusto","doi":"10.1109/ICASSP.1994.389465","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389465","url":null,"abstract":"A novel approach to video coding at very low bit rates is presented, which differs significantly from most of previous approaches, as it uses a spline-like interpolation scheme in a spatiotemporal domain. This operator is applied to a non-uniform 3D grid (built on sets of consecutive frames) so as to allocate the information adaptively. The proposed method allows a full exploitation of intra/inter-frame correlations and a good objective and visual quality of the reconstructed sequences.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132515597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On the equivalence between Gamma and Laguerre filters 关于伽玛滤波器和拉盖尔滤波器的等价性
T. E. O. Silva
{"title":"On the equivalence between Gamma and Laguerre filters","authors":"T. E. O. Silva","doi":"10.1109/ICASSP.1994.389800","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389800","url":null,"abstract":"Proves the equivalence between the Gamma and Laguerre filters. Applying the optimal conditions for Gamma filters, which are easy to obtain, the author arrives at the optimal conditions for Laguerre filters. Curiously these conditions are the same as those of a truncated Laguerre series approximation, which corresponds to the usage of an impulse as the input of the Laguerre filter. The author illustrates these results with an example. The author also investigates the relative merits of both structures in an adaptive filter setup.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129981240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Statistical analysis of the median based multi-shell order-statistics filters 基于中值的多壳序统计过滤器的统计分析
J. J. Li, A. Ramsingh
{"title":"Statistical analysis of the median based multi-shell order-statistics filters","authors":"J. J. Li, A. Ramsingh","doi":"10.1109/ICASSP.1994.389510","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389510","url":null,"abstract":"The multi-shell median filters have been shown to be effective in preserving image details as well as in the suppression of impulsive noise. In this paper, the statistical analysis of a general class of median based multi-shell order-statistics filters is presented. Using statistical threshold decomposition, together with a tri-tree structure, the statistical properties of the filters were derived. Based on the results, a 2-D nonlinear filter which is of good compromise between noise attenuation and detail preservation to fit various applications can be obtained.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130131082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Spectrum reuse using transmitting antenna arrays with feedback 利用带反馈的发射天线阵列复用频谱
D. Gerlach, A. Paulraj
{"title":"Spectrum reuse using transmitting antenna arrays with feedback","authors":"D. Gerlach, A. Paulraj","doi":"10.1109/ICASSP.1994.389867","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389867","url":null,"abstract":"Currently, a central base station communicates simultaneously with several mobile users by allocating a separate time or frequency channel for each mobile to prevent undesired crosstalk. However, each time or frequency channel may be reused among several mobiles by means of an antenna array at the base station which points a separate beam at each user. The downlink beamformer would normally operate in an \"open loop\" mode, in which the base steers a mainlobe in the direction of each mobile. Such a system may operate effectively in a free space environment with no multipath. In the presence of scattering, open loop methods will not perform adequately. A new \"closed loop\" technique is presented in which each mobile user feeds back to the base estimates of the received signal amplitudes. Using feedback, the base station can achieve precision beamforming resulting in lower crosstalk and improved signal separation even in the presence of strong scattering environments.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130131980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Channel equalization with perceptrons: an information-theoretic approach 用感知器实现信道均衡:一种信息论方法
T. Adalı, M. Sönmez
{"title":"Channel equalization with perceptrons: an information-theoretic approach","authors":"T. Adalı, M. Sönmez","doi":"10.1109/ICASSP.1994.390039","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.390039","url":null,"abstract":"We formulate the adaptive channel equalization as a conditional probability distribution learning problem. Conditional probability density function of the transmitted signal given the received signal is parametrized by a sigmoidal perceptron. In this framework, we use relative entropy (Kullback-Leibler distance) between the true and the estimated distributions as the cost function to be minimized. The true probabilities are approximated by their stochastic estimators resulting in a stochastic relative entropy cost function. This function is well-formed in the sense of Wittner and Denker (1988), therefore gradient descent on this cost function is guaranteed to find a solution. The consistency and asymptotic normality of this learning scheme are shown via maximum partial likelihood estimation of logistic models. As a practical example, we demonstrate that the resulting algorithm successfully equalizes multipath channels.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"9923 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130244413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
The voice across Japan database-the Japanese language contribution to Polyphone 横跨日本的声音数据库-日语对Polyphone的贡献
Thomas Staples, J. Picone, Nozomi Arai
{"title":"The voice across Japan database-the Japanese language contribution to Polyphone","authors":"Thomas Staples, J. Picone, Nozomi Arai","doi":"10.1109/ICASSP.1994.389348","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389348","url":null,"abstract":"Texas Instruments' Voice Across Japan (VAJ) database, modeled after the highly successful Voice Across America project, consists of a wide range of diverse speech material including digit strings, yes/no questions, and phonetically-rich read sentences. The data is being collected using long distance telephone lines and an analog telephone interface. The target size is 14 items per speaker by 10,000 speakers. Greater emphasis is being placed on the collection of phonetically-rich read sentence data. Four randomly selected sentences are included in each session: one from the 512 sentence ATR PB set, and three from a 10,000 sentence set developed specifically for this project. This latter sentence set, designed to maximize the triphone coverage of the database, is described. The VAJ database is planned to be included in the Linguistic Data Consortium's (LDC) Polyphone (multi-language) database.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"29 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134041950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信