IEEE Trans. Speech Audio Process.最新文献_第3页

ANIQUE: An Auditory Model for Single-Ended Speech Quality Estimation ANIQUE:单端语音质量估计的听觉模型

IEEE Trans. Speech Audio Process. Pub Date : 2005-08-15 DOI: 10.1109/TSA.2005.851924

Doh-Suk Kim

引用次数: 115

Combination of autocorrelation-based features and projection measure technique for speaker identification 基于自相关特征与投影测量相结合的说话人识别技术

IEEE Trans. Speech Audio Process. Pub Date : 2005-06-20 DOI: 10.1109/TSA.2005.848893

Kuo-Hwei Yuo, Tai-Hwei Hwang, Hsiao-Chuan Wang

引用次数: 18

Rapid online adaptation based on transformation space model evolution 基于变换空间模型演化的快速在线自适应

IEEE Trans. Speech Audio Process. Pub Date : 2005-02-22 DOI: 10.1109/TSA.2004.841427

Dong Kook Kim, N. Kim

引用次数: 5

Crosstalk resilient interference cancellation in microphone arrays using Capon beamforming

IEEE Trans. Speech Audio Process. Pub Date : 2004-08-16 DOI: 10.1109/TSA.2004.833011

Wing-Kin Ma, P. Ching, B. Vo

{"title":"Crosstalk resilient interference cancellation in microphone arrays using Capon beamforming","authors":"Wing-Kin Ma, P. Ching, B. Vo","doi":"10.1109/TSA.2004.833011","DOIUrl":"https://doi.org/10.1109/TSA.2004.833011","url":null,"abstract":"This paper studies a reference-assisted approach for interference canceling (IC) in microphone array systems. Conventionally, reference-assisted IC is based on the zero crosstalk assumption; i.e., when the desired source signal is absent in the reference microphones. In applications where crosstalk is inevitable, the conventional IC approach usually exhibits degraded performance due to cancellation of the desired signal. In this paper, we develop a crosstalk resilient IC method based on the Capon beamforming technique. The proposed beamformer deals with the uncertainty of crosstalk by applying a constraint on the worst-case crosstalk magnitude. The proposed beamformer not only performs IC, it also provides blind beamforming of the desired signal. We show that a blind beamformer based on the traditional minimum-mean-square-error (MMSE) IC method is a special case of the proposed beamformer. One key step of implementing the proposed Capon beamformer lies in solving a difficult nonconvex optimization problem, and we illustrate how the Capon optimal solution can be effectively approximated using the so-called semidefinite relaxation algorithm. Simulation results demonstrate that the proposed beamformer is more robust against crosstalk-induced signal cancellation than beamformers based on the MMSE-IC methods.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"2 1","pages":"468-477"},"PeriodicalIF":0.0,"publicationDate":"2004-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76642765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Introduction to the Special Issue on Multichannel Signal Processing for Audio and Acoustics Applications 音频和声学应用中的多通道信号处理特刊导论

IEEE Trans. Speech Audio Process. Pub Date : 2004-08-16 DOI: 10.1109/TSA.2004.833716

Walter Kellermann, M. Sondhi, D. DeVries

{"title":"Introduction to the Special Issue on Multichannel Signal Processing for Audio and Acoustics Applications","authors":"Walter Kellermann, M. Sondhi, D. DeVries","doi":"10.1109/TSA.2004.833716","DOIUrl":"https://doi.org/10.1109/TSA.2004.833716","url":null,"abstract":"HE IEEE Signal Processing Society has its roots in an area where acoustics, speech, and signal processing converge, as was reflected in the former name of the society when it was founded in 1974. The interface between acoustics, speech, and signal processing is still an area of great interest to the society, with many fundamental problems still unsolved. Research is driven by applications where acoustic signals have to be captured, transmitted, and/or reproduced in an acoustic environment that includes echoes, noise, and reverberation Considering human/machine interfaces as a major area of applications, it is obvious that signal processing becomes more challenging as the distance between humans and the machines increases, as the signal bandwidth increases, and as the acoustic environment becomes more complex and hostile. Increasingly sophisticated algorithms have been developed since the mid-1970s and along with the availability of greatly increased and affordable computational power, multichannel signal processing algorithms naturally evolved for exploiting the spatial dimension of acoustic signals. The importance and popularity of this field was well reflected by the large number of submissions to this special issue. The volume of high-quality papers could not be fitted into the page budget allotted to us. Thus, we regrettably had to decide to publish some of them in a second section of this special issue as part of a regular issue of the TRANSACTIONS in early 2005. For sound reproduction, where we want to provide a pair of desired signals at the listeners’ ear drums, seamless human/machine interfaces based on multichannel techniques have been implemented since the invention of stereo systems. However, providing the true spatial sound experience in large listening spaces became possible only with new multichannel signal processing techniques, such as wavefield synthesis. Still, major challenges remain, especially phase-true equalization of listening room acoustics and the cancellation of local noise sources and interferers. On the other hand, acquisition of audio and speech signals has been a research topic since the invention of the microphone and still today presents major challenges for the signal processing community. Structurally the simplest problem, the acoustic feedback from loudspeakers into microphones is addressed by acoustic echo cancellation: From the single-channel case which has been investigated since the 1970s, research has moved on to stereo and multichannel reproduction, recently culminating in a new wave-domain adaptive filtering concept which has been presented for the first time at ICASSP 2004. For removing unwanted interference and noise from desired signals, multichannel techniques utilize spatial diversity to discriminate between desired and undesired components, either by exploiting different spatial coherence properties or by beamforming, which directs a beam of increased sensitivity towards the desired source. For tr","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"11 1","pages":"449-450"},"PeriodicalIF":0.0,"publicationDate":"2004-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87052996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Introduction to the Special Issue on Spontaneous Speech Processing 自发语音处理专题导论

IEEE Trans. Speech Audio Process. Pub Date : 2004-06-21 DOI: 10.1109/TSA.2004.828628

S. Furui, M. Beckman, Julia Hirschberg, S. Itahashi, Tatsuya Kawahara, Satoshi Nakamura, Shrikanth S. Narayanan

引用次数: 5

From the Editor-in-Chief 来自总编辑

IEEE Trans. Speech Audio Process. Pub Date : 2004-01-01 DOI: 10.1109/TSA.2004.837946

I. Trancoso

引用次数: 0

Source localization in reverberant environments: modeling and statistical analysis 混响环境中的声源定位:建模和统计分析

IEEE Trans. Speech Audio Process. Pub Date : 2003-11-01 DOI: 10.1109/TSA.2003.818027

T. Gustafsson, B. Rao, M. Trivedi

引用次数: 167

Robust time delay estimation exploiting redundancy among multiple microphones 利用多传声器冗余的鲁棒时延估计

IEEE Trans. Speech Audio Process. Pub Date : 2003-11-01 DOI: 10.1109/TSA.2003.818025

Jingdong Chen, J. Benesty, Yiteng Huang

引用次数: 158

Robust recognition of children's speech 对儿童言语的强大识别

IEEE Trans. Speech Audio Process. Pub Date : 2003-11-01 DOI: 10.1109/TSA.2003.818026

A. Potamianos, Shrikanth S. Narayanan

{"title":"Robust recognition of children's speech","authors":"A. Potamianos, Shrikanth S. Narayanan","doi":"10.1109/TSA.2003.818026","DOIUrl":"https://doi.org/10.1109/TSA.2003.818026","url":null,"abstract":"Developmental changes in speech production introduce age-dependent spectral and temporal variability in the speech signal produced by children. Such variabilities pose challenges for robust automatic recognition of children's speech. Through an analysis of age-related acoustic characteristics of children's speech in the context of automatic speech recognition (ASR), effects such as frequency scaling of spectral envelope parameters are demonstrated. Recognition experiments using acoustic models trained from adult speech and tested against speech from children of various ages clearly show performance degradation with decreasing age. On average, the word error rates are two to five times worse for children speech than for adult speech. Various techniques for improving ASR performance on children's speech are reported. A speaker normalization algorithm that combines frequency warping and model transformation is shown to reduce acoustic variability and significantly improve ASR performance for children speakers (by 25-45% under various model training and testing conditions). The use of age-dependent acoustic models further reduces word error rate by 10%. The potential of using piece-wise linear and phoneme-dependent frequency warping algorithms for reducing the variability in the acoustic feature space of children is also investigated.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"7 1","pages":"603-616"},"PeriodicalIF":0.0,"publicationDate":"2003-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76983919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 213