IEEE Transactions on Audio Speech and Language Processing最新文献_第2页

Declipping of Audio Signals Using Perceptual Compressed Sensing 基于感知压缩感知的音频信号降噪

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-12-01 DOI: 10.1109/TASL.2013.2281570

Bruno Defraene, Naim Mansour, S. D. Hertogh, T. Waterschoot, M. Diehl, M. Moonen

引用次数: 56

Understanding Effects of Subjectivity in Measuring Chord Estimation Accuracy 认识主观性对弦估计精度测量的影响

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-12-01 DOI: 10.1109/TASL.2013.2280218

Y. Ni, Matt McVicar, Raúl Santos-Rodríguez, T. D. Bie

引用次数: 42

A Bag of Systems Representation for Music Auto-Tagging 音乐自动标注的一种系统表示

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-12-01 DOI: 10.1109/TASL.2013.2279318

Katherine Ellis, E. Coviello, Antoni B. Chan, Gert R. G. Lanckriet

引用次数: 22

Geometry-Based Spatial Sound Acquisition Using Distributed Microphone Arrays 基于几何的分布式麦克风阵列空间声音采集

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-12-01 DOI: 10.1109/TASL.2013.2280210

O. Thiergart, G. D. Galdo, Maja Taseska, Emanuël Habets

{"title":"Geometry-Based Spatial Sound Acquisition Using Distributed Microphone Arrays","authors":"O. Thiergart, G. D. Galdo, Maja Taseska, Emanuël Habets","doi":"10.1109/TASL.2013.2280210","DOIUrl":"https://doi.org/10.1109/TASL.2013.2280210","url":null,"abstract":"Traditional spatial sound acquisition aims at capturing a sound field with multiple microphones such that at the reproduction side a listener can perceive the sound image as it was at the recording location. Standard techniques for spatial sound acquisition usually use spaced omnidirectional microphones or coincident directional microphones. Alternatively, microphone arrays and spatial filters can be used to capture the sound field. From a geometric point of view, the perspective of the sound field is fixed when using such techniques. In this paper, a geometry-based spatial sound acquisition technique is proposed to compute virtual microphone signals that manifest a different perspective of the sound field. The proposed technique uses a parametric sound field model that is formulated in the time-frequency domain. It is assumed that each time-frequency instant of a microphone signal can be decomposed into one direct and one diffuse sound component. It is further assumed that the direct component is the response of a single isotropic point-like source (IPLS) of which the position is estimated for each time-frequency instant using distributed microphone arrays. Given the sound components and the position of the IPLS, it is possible to synthesize a signal that corresponds to a virtual microphone at an arbitrary position and with an arbitrary pick-up pattern.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"2583-2594"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2280210","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 45

Source/Filter Factorial Hidden Markov Model, With Application to Pitch and Formant Tracking 源/滤波器阶乘隐马尔可夫模型及其在基音和峰跟踪中的应用

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-12-01 DOI: 10.1109/TASL.2013.2277941

Jean-Louis Durrieu, J. Thiran

{"title":"Source/Filter Factorial Hidden Markov Model, With Application to Pitch and Formant Tracking","authors":"Jean-Louis Durrieu, J. Thiran","doi":"10.1109/TASL.2013.2277941","DOIUrl":"https://doi.org/10.1109/TASL.2013.2277941","url":null,"abstract":"Tracking vocal tract formant frequencies <formula formulatype=\"inline\"> <tex Notation=\"TeX\">$(f_{p})$</tex></formula> and estimating the fundamental frequency <formula formulatype=\"inline\"><tex Notation=\"TeX\">$(f_{0})$</tex> </formula> are two tracking problems that have been tackled in many speech processing works, often independently, with applications to articulatory parameters estimations, speech analysis/synthesis or linguistics. Many works assume an auto-regressive (AR) model to fit the spectral envelope, hence indirectly estimating the formant tracks from the AR parameters. However, directly estimating the formant frequencies, or equivalently the poles of the AR filter, allows to further model the smoothness of the desired tracks. In this paper, we propose a Factorial Hidden Markov Model combined with a vocal source/filter model, with parameters naturally encoding the <formula formulatype=\"inline\"><tex Notation=\"TeX\">$f_{0}$</tex></formula> and <formula formulatype=\"inline\"> <tex Notation=\"TeX\">$f_{p}$</tex></formula> tracks. Two algorithms are proposed, with two different strategies: first, a simplification of the underlying model, with a parameter estimation based on variational methods, and second, a sparse decomposition of the signal, based on Non-negative Matrix Factorization methodology. The results are comparable to state-of-the-art formant tracking algorithms. With the use of a complete production model, the proposed systems provide robust formant tracks which can be used in various applications. The algorithms could also be extended to deal with multiple-speaker signals.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"2541-2553"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2277941","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

HMM Based Intermediate Matching Kernel for Classification of Sequential Patterns of Speech Using Support Vector Machines 基于HMM的中间匹配核支持向量机语音序列模式分类

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-12-01 DOI: 10.1109/TASL.2013.2279338

A. D. Dileep, C. Sekhar

{"title":"HMM Based Intermediate Matching Kernel for Classification of Sequential Patterns of Speech Using Support Vector Machines","authors":"A. D. Dileep, C. Sekhar","doi":"10.1109/TASL.2013.2279338","DOIUrl":"https://doi.org/10.1109/TASL.2013.2279338","url":null,"abstract":"In this paper, we address the issues in the design of an intermediate matching kernel (IMK) for classification of sequential patterns using support vector machine (SVM) based classifier for tasks such as speech recognition. Specifically, we address the issues in constructing a kernel for matching sequences of feature vectors extracted from the speech signal data of utterances. The codebook based IMK and Gaussian mixture model (GMM) based IMK have been proposed earlier for matching the varying length patterns represented as sets of features vectors for tasks such as image classification and speaker recognition. These methods consider the centers of clusters and the components of GMM as the virtual feature vectors used in the design of IMK. As these methods do not use sequence information in matching the patterns, these methods are not suitable for matching sequential patterns. We propose the hidden Markov model (HMM) based IMK for matching sequential patterns of varying length. We consider two approaches to design the HMM-based IMK. In the first approach, each of the two sequences to be matched is segmented into subsequences with each subsequence aligned to a state of the HMM. Then the HMM-based IMK is constructed as a combination of state-specific GMM-based IMKs that match the subsequences aligned with the particular states of the HMM. In the second approach, the HMM-based IMK is constructed without segmenting sequences, and by matching the local feature vectors selected using the responsibility terms that account for being in a state and generating the feature vectors by a component of the GMM of that state. We study the performance of the SVM based classifiers using the proposed HMM-based IMK for recognition of isolated utterances of E-set in English alphabet and recognition of consonent–vowel segments in Hindi language.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"2570-2582"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2279338","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Improving Graph-Based Dependency Parsing Models With Dependency Language Models 用依赖语言模型改进基于图的依赖解析模型

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2273715

Min Zhang, Wenliang Chen, Xiangyu Duan, Rong Zhang

{"title":"Improving Graph-Based Dependency Parsing Models With Dependency Language Models","authors":"Min Zhang, Wenliang Chen, Xiangyu Duan, Rong Zhang","doi":"10.1109/TASL.2013.2273715","DOIUrl":"https://doi.org/10.1109/TASL.2013.2273715","url":null,"abstract":"For graph-based dependency parsing, how to enrich high-order features without increasing decoding complexity is a very challenging problem. To solve this problem, this paper presents an approach to representing high-order features for graph-based dependency parsing models using a dependency language model and beam search. Firstly, we use a baseline parser to parse a large-amount of unannotated data. Then we build the dependency language model (DLM) on the auto-parsed data. A set of new features is represented based on the DLM. Finally, we integrate the DLM-based features into the parsing model during decoding by beam search. We also utilize the features in bilingual text (bitext) parsing models. The main advantages of our approach are: 1) we utilize rich high-order features defined over a view of large scope and additional large raw corpus; 2) our approach does not increase the decoding complexity. We evaluate the proposed approach on the monotext and bitext parsing tasks. In the monotext parsing task, we conduct the experiments on Chinese and English data. The experimental results show that our new parser achieves the best accuracy on the Chinese data and comparable accuracy with the best known systems on the English data. In the bitext parsing task, we conduct the experiments on a Chinese-English bilingual data and our score is the best reported so far.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"14 1","pages":"2313-2323"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2273715","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62891502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Acoustic Modeling With Hierarchical Reservoirs 分层储层声学建模

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2280209

Fabian Triefenbach, A. Jalalvand, Kris Demuynck, J. Martens

引用次数: 72

Robust Ultra-Low Latency Soft-Decision Decoding of Linear PCM Audio 线性PCM音频的鲁棒超低延迟软判决解码

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2273716

Florian Pflug, T. Fingscheidt

{"title":"Robust Ultra-Low Latency Soft-Decision Decoding of Linear PCM Audio","authors":"Florian Pflug, T. Fingscheidt","doi":"10.1109/TASL.2013.2273716","DOIUrl":"https://doi.org/10.1109/TASL.2013.2273716","url":null,"abstract":"Applications such as professional wireless digital microphones require a transmission of practically uncoded high-quality audio with ultra-low latency on the one hand and robustness to error-prone channels on the other hand. The delay restrictions, however, prohibit the utilization of efficient block or convolutional channel codes for error protection. The contribution of this work is fourfold: We revise and summarize concisely a Bayesian framework for soft-decision audio decoding and present three novel approaches to (almost) latency-free robust decoding of uncompressed audio. Bit reliability information from the transmission channel is exploited, as well as short-term and long-term residual redundancy within the audio signal, and optionally some explicit redundancy in terms of a sample-individual block code. In all cases we utilize variants of higher-order linear prediction to compute prediction probabilities in three novel ways: Firstly by employing a serial cascade of multiple predictors, secondly by exploiting explicit redundancy in form of parity bits, and thirdly by utilizing an interpolative forward/backward prediction algorithm. The first two presented approaches work fully delayless, while the third one introduces an ultra-low algorithmic delay of just a few samples. The effectiveness of the proposed algorithms is proven in simulations with BPSK and typical digital microphone FSK modulation schemes on AWGN and bursty fading channels.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"2324-2336"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2273716","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62891352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Scalable Speech Coding for IP Networks: Beyond iLBC IP网络的可扩展语音编码:超越iLBC

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2274694

Koji Seto, T. Ogunfunmi

{"title":"Scalable Speech Coding for IP Networks: Beyond iLBC","authors":"Koji Seto, T. Ogunfunmi","doi":"10.1109/TASL.2013.2274694","DOIUrl":"https://doi.org/10.1109/TASL.2013.2274694","url":null,"abstract":"High quality speech at low bit rates makes code excited linear prediction (CELP) the dominant choice for a narrowband coding technique despite the susceptibility to packet loss. One of the few techniques which received attention after the introduction of CELP coding technique is the internet low bitrate codec (iLBC) because of inherent high robustness to packet loss. Addition of rate flexibility and scalability makes the iLBC an attractive choice for voice communication over IP networks. In this paper, performance improvement schemes of multi-rate iLBC and its scalable structure are proposed, and the proposed codec enhanced from the previous work is re-designed based on the subjective listening quality instead of the objective quality. In particular, perceptual weighting and the modified discrete cosine transform (MDCT) with short overlap in weighted signal domain are employed along with the improved packet loss concealment (PLC) algorithm. The subjective evaluation results show that the speech quality of the proposed codec is equivalent to that of state-of-the-art codec, G.718, under both a clean channel condition and lossy channel conditions. This result is significant considering that development of the proposed codec is still in early stage.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"2337-2345"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2274694","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62891744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10