2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)最新文献

筛选
英文 中文
Speech quality objective assessment using neural network 基于神经网络的语音质量客观评价
Q. Fu, Kechu Yi, Mingui Sun
{"title":"Speech quality objective assessment using neural network","authors":"Q. Fu, Kechu Yi, Mingui Sun","doi":"10.1109/ICASSP.2000.861932","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.861932","url":null,"abstract":"This paper presents a novel method for objective assessment of speech quality based on one-step strategy using a feedfoward neutral network. Currently, almost all the existing methods for this assessment can be regarded as a two-step strategy, requiring a distortion computation and a mapping from the average distortion value to the mean opinion score (MOS). Our new method combines these two steps by means of a neural network which can incorporate the perception properties of the human auditory system and provide an MOS estimate directly. Our theoretical analysis and experimental results suggest that this method of MOS estimate significantly overperforms the traditional methods. The correlation coefficient between the subjective test score and objective MOS estimate can reach up to about 0.95.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125741082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Projective residual vector quantization and mapped residual pooling 投影残差矢量量化和映射残差池化
Ryan P. Thomas, T. Moon
{"title":"Projective residual vector quantization and mapped residual pooling","authors":"Ryan P. Thomas, T. Moon","doi":"10.1109/ICASSP.2000.859197","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.859197","url":null,"abstract":"This paper points out two potential problems with residual vector quantization (RVQ): tree entanglement and non-projectiveness of the quantizer. The use of a boundary normalization mapping is proposed to pool all quantization residuals at a stage into identically-shaped regions, reducing or eliminating entanglement. Also, a reconstruction codebook is proposed to eliminate the non-projectiveness is proposed. Results are presented on both random and image data.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"39 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123281843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Implementing a high accuracy speaker-independent continuous speech recognizer on a fixed-point DSP 在定点DSP上实现高精度独立于说话人的连续语音识别器
Y. Gong, Yu-Hung Kao
{"title":"Implementing a high accuracy speaker-independent continuous speech recognizer on a fixed-point DSP","authors":"Y. Gong, Yu-Hung Kao","doi":"10.1109/ICASSP.2000.860202","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.860202","url":null,"abstract":"Continuous speech recognition is a resource-intensive algorithm. Commercial dictation software requires more than 10 Mbytes to install on the disk and 32 Mbytes RAM to run the application. A typical embedded system can not afford this much RAM because of its high cost and power consumption; it also lacks disk to store the large amount of static data (e.g. acoustic models). We have been working on optimization of a small vocabulary speech recognizer suitable for implementation on a 16-bit fixed-point DSP. This recognizer supports sophisticated continuous density, tied-mixtures Gaussians, parallel model combination, and a noise-robust utterance detection algorithm. The fixed-point version achieves the same performance as the floating-point version. The algorithm runs real-time on a 100 MHz, 16-bit, fixed-point Texas Instruments TMS320C5410 even for the most challenging continuous digit dialing with hands-free microphone in driving conditions.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125573153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Concatenating syllables for response generation in spoken language applications 在口语应用程序中响应生成的串接音节
T. Fung, H. Meng
{"title":"Concatenating syllables for response generation in spoken language applications","authors":"T. Fung, H. Meng","doi":"10.1109/ICASSP.2000.859114","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.859114","url":null,"abstract":"We describe our approach in developing a speech synthesis technique for response generation in domain-specific spoken language applications. Our approach handles two Chinese dialects-Cantonese and Putonghua. We chose the foreign exchange domain, and worked with its constrained vocabulary and response expressions. The syllable is selected to be our basic unit for concatenation. Each unit label includes a two-digit appendix to encode the distinctive features of the left and right coarticulatory context. Our approach attempts to maximize intelligibility and naturalness of the responses within the application domain. Hence the synthesized outputs compare favorably with a domain-independent TD-PSOLA synthesizer.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"2009 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125588187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Soft GPD for minimum classification error rate training 用于最小分类错误率训练的软GPD
Bertram E. Shi, K. Yao, Z. Cao
{"title":"Soft GPD for minimum classification error rate training","authors":"Bertram E. Shi, K. Yao, Z. Cao","doi":"10.1109/ICASSP.2000.861803","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.861803","url":null,"abstract":"Minimum classification error (MCE) rate training is a discriminative training method which seeks to minimize an empirical estimate of the error probability derived over a training set. The segmental generalized probabilistic descent (GPD) algorithm for MCE uses the log likelihood of the best path as a discriminant function to estimate the error probability. This paper shows that by using a discriminant function similar to the auxiliary function used in EM, we can obtain a \"soft\" version of GPD in the sense that information about all possible paths is retained. Complexity is similar to segmental GPD. For certain parameter values, the algorithm is equivalent to segmental GPD. By modifying the misclassification measure usually used, we can obtain an algorithm for embedded MCE training for continuous speech which does not require a separate N-best search to determine competing classes. Experimental results show error rate reduction of 20% compared with maximum likelihood training.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126628896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Infinitely divisible cascade analysis of network traffic data 网络流量数据的无限可分级联分析
D. Veitch, P. Abry, P. Flandrin, P. Chainais
{"title":"Infinitely divisible cascade analysis of network traffic data","authors":"D. Veitch, P. Abry, P. Flandrin, P. Chainais","doi":"10.1109/ICASSP.2000.861931","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.861931","url":null,"abstract":"Infinitely divisible cascades are a model class previously introduced in the field of turbulence to describe the statistics of velocity fields. In this paper, using a wavelet reformulation of the cascades, we investigate their ability to analyze band model scaling properties of data and compare their fundamental ingredients to those of other scaling model classes such as self-similar and multifractal processes. We also propose an estimation procedure for the propagator or kernel of the cascades. Finally the cascade model is successfully applied to describe Internet TCP network traffic data, bringing new insights into their scaling properties and revealing a pitfall in existing techniques.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126637684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Exploring permutation inconsistency in blind separation of speech signals in a reverberant environment 混响环境下语音信号盲分离中的排列不一致性研究
M. Ikram, D. Morgan
{"title":"Exploring permutation inconsistency in blind separation of speech signals in a reverberant environment","authors":"M. Ikram, D. Morgan","doi":"10.1109/ICASSP.2000.859141","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.859141","url":null,"abstract":"We study and explore the limitations of methods for blind separation of a mixture of multiple speakers in a real reverberant environment. To support our results, we analyze a frequency-domain method, which achieves blind source separation (BSS) by transforming the time-domain convolutive problem to multiple short-term problems in the frequency domain. We show that treating the problem independently at different frequency bins introduces a \"permutation inconsistency\" problem, which becomes worse as the length of room impulse response increases. Our studies prove that the ideas proposed in the existing literature are not capable of effectively handling this problem and a need exists for its satisfactory solution. We speculate that time-domain BSS techniques may also suffer from an equivalent permutation inconsistency problem when long un-mixing filters are used.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126648196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 88
Reconstruction of chaotic dynamics using a noise-robust embedding method 用噪声鲁棒嵌入方法重建混沌动力学
W. Yoshida, S. Ishii, Masa-aki Sato
{"title":"Reconstruction of chaotic dynamics using a noise-robust embedding method","authors":"W. Yoshida, S. Ishii, Masa-aki Sato","doi":"10.1109/ICASSP.2000.861907","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.861907","url":null,"abstract":"In this article, we discuss the reconstruction of chaotic dynamics in a partial observation situation. As a function approximator, we employ a normalized Gaussian network (NGnet), which is trained by an on-line EM algorithm. In order to deal with the partial observation, we propose a new embedding method based on smoothing filters, which is called integral embedding. The NGnet is trained to learn the dynamical system in the integral coordinate space. Experimental results show that the trained NGnet is able to reproduce a chaotic attractor that well approximates the complexity and instability of the original chaotic attractor, even when the data involve large noise. In comparison with our previous method using delay coordinate embedding, this new method is more robust to noise and faster in learning.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114896989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
An oblivious robust digital watermark technique for still images using DCT phase modulation 基于DCT相位调制的静止图像无关联鲁棒数字水印技术
Faisal Alturki, R. Mersereau
{"title":"An oblivious robust digital watermark technique for still images using DCT phase modulation","authors":"Faisal Alturki, R. Mersereau","doi":"10.1109/ICASSP.2000.859218","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.859218","url":null,"abstract":"Digital watermarking is the process of secretly embedding a short sequence of information inside a digital source without changing its perceptual quality. We present a new oblivious digital watermarking method for copyright protection of still images. The technique is based on modifying the sign of a subset of low frequency DCT magnitude coefficients. The robustness to a number of standard image processing attacks is demonstrated using the criteria of the latest Stirmark test.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115559120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Bias of feedback cancellation algorithms based on direct closed loop identification 基于直接闭环辨识的反馈对消算法
J. Hellgren, U. Forssell
{"title":"Bias of feedback cancellation algorithms based on direct closed loop identification","authors":"J. Hellgren, U. Forssell","doi":"10.1109/ICASSP.2000.859098","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.859098","url":null,"abstract":"An adaptive filter can be used to cancel the undesired acoustic feedback of hearing aids. The adaptive algorithm studied in this paper uses the output and input signal of the hearing aid to continuously track the acoustic feedback path. The bias of the optimal estimate with a quadratic norm is analyzed. The results show the importance of having a good model of the input signal to the hearing aid, as the error in this model will introduce bias in the estimate of the feedback path.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"387 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115990733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信