IEEE Transactions on Audio Speech and Language Processing最新文献_第10页

Coding-Based Informed Source Separation: Nonnegative Tensor Factorization Approach 基于编码的信息源分离:非负张量分解方法

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2260153

A. Ozerov, A. Liutkus, R. Badeau, G. Richard

{"title":"Coding-Based Informed Source Separation: Nonnegative Tensor Factorization Approach","authors":"A. Ozerov, A. Liutkus, R. Badeau, G. Richard","doi":"10.1109/TASL.2013.2260153","DOIUrl":"https://doi.org/10.1109/TASL.2013.2260153","url":null,"abstract":"Informed source separation (ISS) aims at reliably recovering sources from a mixture. To this purpose, it relies on the assumption that the original sources are available during an encoding stage. Given both sources and mixture, a side-information may be computed and transmitted along with the mixture, whereas the original sources are not available any longer. During a decoding stage, both mixture and side-information are processed to recover the sources. ISS is motivated by a number of specific applications including active listening and remixing of music, karaoke, audio gaming, etc. Most ISS techniques proposed so far rely on a source separation strategy and cannot achieve better results than oracle estimators. In this study, we introduce Coding-based ISS (CISS) and draw the connection between ISS and source coding. CISS amounts to encode the sources using not only a model as in source coding but also the observation of the mixture. This strategy has several advantages over conventional ISS methods. First, it can reach any quality, provided sufficient bandwidth is available as in source coding. Second, it makes use of the mixture in order to reduce the bitrate required to transmit the sources, as in classical ISS. Furthermore, we introduce Nonnegative Tensor Factorization as a very efficient model for CISS and report rate-distortion results that strongly outperform the state of the art.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1699-1712"},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2260153","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62889642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46

Reverberation and Noise Robust Feature Compensation Based on IMM 基于IMM的混响和噪声鲁棒特征补偿

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2256893

C. Han, S. Kang, N. Kim

引用次数: 11

MDCT Sinusoidal Analysis for Audio Signals Analysis and Processing MDCT正弦分析音频信号的分析与处理

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-07-01 DOI: 10.1109/TASL.2013.2250963

Shuhua Zhang, W. Dou, Huazhong Yang

{"title":"MDCT Sinusoidal Analysis for Audio Signals Analysis and Processing","authors":"Shuhua Zhang, W. Dou, Huazhong Yang","doi":"10.1109/TASL.2013.2250963","DOIUrl":"https://doi.org/10.1109/TASL.2013.2250963","url":null,"abstract":"The Modified Discrete Cosine Transform (MDCT) is widely used in audio signals compression, but mostly limited to representing audio signals. This is because the MDCT is a real transform: Phase information is missing and spectral power varies frame to frame even for pure sine waves. We have a key observation concerning the structure of the MDCT spectrum of a sine wave: Across frames, the complete spectrum changes substantially, but if separated into even and odd subspectra, neither changes except scaling. Inspired by this observation, we find that the MDCT spectrum of a sine wave can be represented as an envelope factor times a phase-modulation factor. The first one is shift-invariant and depends only on the sine wave's amplitude and frequency, thus stays constant over frames. The second one has the form of sinθ for all odd bins and cosθ for all even bins, leading to subspectra's constant shapes. But this θ depends on the start point of a transform frame, therefore, changes at each new frame, and then changes the whole spectrum. We apply this formulation of the MDCT spectral structure to frequency estimation in the MDCT domain, both for pure sine waves and sine waves with noises. Compared to existing methods, ours are more accurate and more general (not limited to the sine window). We also apply the spectral structure to stereo coding. A pure tone or tone-dominant stereo signal may have very different left and right MDCT spectra, but their subspectra have similar shapes. One ratio for even bins and one ratio for odd bins will be enough to reconstruct the right from the left, saving half bitrate. This scheme is simple and at the same time more efficient than the traditional Intensity Stereo (IS).","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1403-1414"},"PeriodicalIF":0.0,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2250963","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

A Symmetric Kernel Partial Least Squares Framework for Speaker Recognition 一种对称核偏最小二乘框架用于说话人识别

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-07-01 DOI: 10.1109/TASL.2013.2253096

Balaji Vasan Srinivasan, Yuancheng Luo, D. Garcia-Romero, D. Zotkin, R. Duraiswami

引用次数: 13

Model-Based Inversion of Dynamic Range Compression 基于模型的动态范围压缩反演

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-07-01 DOI: 10.1109/TASL.2013.2253099

Stanislaw Gorlow, J. Reiss

引用次数: 24

CLOSE—A Data-Driven Approach to Speech Separation 基于数据驱动的语音分离方法

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-07-01 DOI: 10.1109/TASL.2013.2250959

J. Ming, R. Srinivasan, D. Crookes, Ayeh Jafari

{"title":"CLOSE—A Data-Driven Approach to Speech Separation","authors":"J. Ming, R. Srinivasan, D. Crookes, Ayeh Jafari","doi":"10.1109/TASL.2013.2250959","DOIUrl":"https://doi.org/10.1109/TASL.2013.2250959","url":null,"abstract":"This paper studies single-channel speech separation, assuming unknown, arbitrary temporal dynamics for the speech signals to be separated. A data-driven approach is described, which matches each mixed speech segment against a composite training segment to separate the underlying clean speech segments. To advance the separation accuracy, the new approach seeks and separates the longest mixed speech segments with matching composite training segments. Lengthening the mixed speech segments to match reduces the uncertainty of the constituent training segments, and hence the error of separation. For convenience, we call the new approach Composition of Longest Segments, or CLOSE. The CLOSE method includes a data-driven approach to model long-range temporal dynamics of speech signals, and a statistical approach to identify the longest mixed speech segments with matching composite training segments. Experiments are conducted on the Wall Street Journal database, for separating mixtures of two simultaneous large-vocabulary speech utterances spoken by two different speakers. The results are evaluated using various objective and subjective measures, including the challenge of large-vocabulary continuous speech recognition. It is shown that the new separation approach leads to significant improvement in all these measures.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1355-1368"},"PeriodicalIF":0.0,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2250959","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

A Perceptual Study on Velvet Noise and Its Variants at Different Pulse Densities 不同脉冲密度下天鹅绒噪声及其变体的感知研究

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-07-01 DOI: 10.1109/TASL.2013.2255281

V. Välimäki, Heidi-Maria Lehtonen, M. Takanen

{"title":"A Perceptual Study on Velvet Noise and Its Variants at Different Pulse Densities","authors":"V. Välimäki, Heidi-Maria Lehtonen, M. Takanen","doi":"10.1109/TASL.2013.2255281","DOIUrl":"https://doi.org/10.1109/TASL.2013.2255281","url":null,"abstract":"This paper investigates sparse noise sequences, including the previously proposed velvet noise and its novel variants defined here. All sequences consist of sample values minus one, zero, and plus one only, and the location and the sign of each impulse is randomly chosen. Two of the proposed algorithms are direct variants of the original velvet noise requiring two random number sequences for determining the impulse locations and signs. In one of the proposed algorithms the impulse locations and signs are drawn from the same random number sequence, which is advantageous in terms of implementation. Moreover, two of the new sequences include known regions of zeros. The perceived smoothness of the proposed sequences was studied with a listening test in which test subjects compared the noise sequences against a reference signal that was a Gaussian white noise. The results show that the original velvet noise sounds smoother than the reference at 2000 impulses per second. At 4000 impulses per second, also three of the proposed algorithms are perceived smoother than the Gaussian noise sequence. These observations can be exploited in the synthesis of noisy sounds and in artificial reverberation.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1481-1488"},"PeriodicalIF":0.0,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2255281","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

Stochastic-Deterministic MMSE STFT Speech Enhancement With General A Priori Information 基于一般先验信息的随机确定性MMSE STFT语音增强

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-07-01 DOI: 10.1109/TASL.2013.2253100

Matthew C. McCallum, B. Guillemin

{"title":"Stochastic-Deterministic MMSE STFT Speech Enhancement With General A Priori Information","authors":"Matthew C. McCallum, B. Guillemin","doi":"10.1109/TASL.2013.2253100","DOIUrl":"https://doi.org/10.1109/TASL.2013.2253100","url":null,"abstract":"A wide range of Bayesian short-time spectral amplitude (STSA) speech enhancement algorithms exist, varying in both the statistical model used for speech and the cost functions considered. Current algorithms of this class consistently assume that the distribution of clean speech short time Fourier transform (STFT) samples are either randomly distributed with zero mean or deterministic. No single distribution function has been considered that captures both deterministic and random signal components. In this paper a Bayesian STSA algorithm is proposed under a stochastic-deterministic (SD) speech model that makes provision for the inclusion of a priori information by considering a non-zero mean. Analytical expressions are derived for the speech STFT magnitude in the MMSE sense, and phase in the maximum-likelihood sense. Furthermore, a practical method of estimating the a priori SD speech model parameters is described based on explicit consideration of harmonically related sinusoidal components in each STFT frame, and variations in both the magnitude and phase of these components between successive STFT frames. Objective tests using the PESQ measure indicate that the proposed algorithm results in superior speech quality when compared to several other speech enhancement algorithms. In particular it is clear that the proposed algorithm has an improved capability to retain low amplitude voiced speech components in low SNR conditions.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1445-1457"},"PeriodicalIF":0.0,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2253100","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Towards Scaling Up Classification-Based Speech Separation 扩大基于分类的语音分离

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-07-01 DOI: 10.1109/TASL.2013.2250961

Yuxuan Wang, Deliang Wang

引用次数: 435

Sparse Reverberant Audio Source Separation via Reweighted Analysis 基于重加权分析的稀疏混响音源分离

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-07-01 DOI: 10.1109/TASL.2013.2250962

S. Arberet, P. Vandergheynst, R. Carrillo, J. Thiran, Y. Wiaux

引用次数: 21