{"title":"CRB analysis of near-field source localization using uniform circular arrays","authors":"J. Delmas, H. Gazzah","doi":"10.1109/ICASSP.2013.6638409","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638409","url":null,"abstract":"This paper is devoted to the Cramer Rao bound (CRB) on the azimuth, elevation and range of a narrow-band near-field source localized by means of a uniform circular array (UCA), using the exact expression of the time delay parameter. After proving that the conditional and unconditional CRB are generally proportional for constant modulus steering vectors, we specify conditions of isotropy w.r.t. the distance and the number of sensors. Then we derive very simple, yet very accurate non-matrix closed-form expressions of different approximations of the CRBs.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125084108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diego J. C. Santiago, Ing Ren Tsang, George D. C. Cavalcanti, I. Tsang
{"title":"Fast block-based algorithms for connected components labeling","authors":"Diego J. C. Santiago, Ing Ren Tsang, George D. C. Cavalcanti, I. Tsang","doi":"10.1109/ICASSP.2013.6638021","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638021","url":null,"abstract":"Block-based algorithms are considered the fastest approach to label connected components in binary images. However, the existing algorithms are two-scan which would need more comparisons if they were used as one-and-a-half-scan algorithms. Here, we proposed a new mask that enables the design of a block-based one-and-a-half-scan algorithm without any extra comparison. Furthermore, three new efficient algorithms for connected components labeling are presented: a block-based two-scan, a pixel-based one-and-a-half-scan and a block-based one-and-a-half-scan. We conducted experiments using synthetic and realistic images to evaluate the performance of the proposed methods compared to the existing methods. The proposed block-based one-and-a-half-scan algorithm presents the best performance in the realistic images dataset composed of 1290 documents. Our block-based two-scan algorithm proved to be the fastest in the synthetic dataset, especially in low density images.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125163995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robustness of speech quality metrics to background noise and network degradations: Comparing ViSQOL, PESQ and POLQA","authors":"Andrew Hines, J. Skoglund, A. Kokaram, N. Harte","doi":"10.1109/ICASSP.2013.6638348","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638348","url":null,"abstract":"The Virtual Speech Quality Objective Listener (ViSQOL) is a new objective speech quality model. It is a signal based full reference metric that uses a spectro-temporal measure of similarity between a reference and a test speech signal. ViSQOL aims to predict the overall quality of experience for the end listener whether the cause of speech quality degradation is due to ambient noise, or transmission channel degradations. This paper describes the algorithm and tests the model using two speech corpora: NOIZEUS and E4. The NOIZEUS corpus contains speech under a variety of background noise types, speech enhancement methods, and SNR levels. The E4 corpus contains voice over IP degradations including packet loss, jitter and clock drift. The results are compared with the ITU-T objective models for speech quality: PESQ and POLQA. The behaviour of the metrics are also evaluated under simulated time warp conditions. The results show that for both datasets ViSQOL performed comparably with PESQ. POLQA was shown to have lower correlation with subjective scores than the other metrics for the NOIZEUS database.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125848680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An integral operator based adaptive signal separation approach","authors":"Xiyuan Hu, Silong Peng, W. Hwang","doi":"10.1109/ICASSP.2013.6638837","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638837","url":null,"abstract":"The operator-based signal separation approach uses an adaptive operator to separate a signal into additive subcomponents. And different types of operator can depict different properties of a signal. In this paper, we define a new kind of integral operator which can be derived from the second kind of Fredholm integral equation. Then, we analyze the properties of the proposed integral operator and discuss its relation to the second condition of Intrinsic Mode Function (IMF). To demonstrate the robustness and efficacy of the proposed operator, we incorporate it into the Null Space Pursuit algorithm to separate several multicomponent signals, including a real-life signal.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126086018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhanced inter-prediction using Merge Prediction Transformation in the HEVC codec","authors":"Saverio G. Blasi, Eduardo Peixoto, E. Izquierdo","doi":"10.1109/ICASSP.2013.6637944","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6637944","url":null,"abstract":"Merge prediction is a novel technique introduced in the HEVC standard to improve inter-prediction exploiting redundancy of themotion information. We propose in this paper a new approach to enhance the Merge mode in a typical HEVC encoder using parametric transformations of the Merge prediction candidates. An Enhanced Inter-Prediction module is implemented in HEVC using Merge Prediction Transformation (MPT), integrated with the HEVC new features such as the large coding units (CU) and the recursive prediction unit partitioning. The MPT parameters are quantised according to the CU depth and the current QP. The optimal quantization steps are derived via statistical analysis as illustrated in the paper. Results show consistent improvements over conventional HEVC encoding in terms of rate-distortion performance, with a small impact on the encoding complexity and negligible impact on the decoding complexity.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125315712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spectral envelope estimation used for audio bandwidth extension based on RBF neural network","authors":"Haojie Liu, C. Bao, Xin Liu","doi":"10.1109/ICASSP.2013.6637706","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6637706","url":null,"abstract":"In this paper a new spectral envelope estimation method based on radial basis function (RBF) neural network is proposed for implementing a blind bandwidth extension method of audio signals. To make the sub-band envelope of high-frequency (HF) components accurately recovered, the RBF neural network is utilized to fit the relationship between low-frequency (LF) features and sub-band envelope of HF components. In addition, the fine structure of HF components which can guarantee the timber of the extended audio signal is reconstructed based on nonlinear dynamics. The objective and subjective test results indicate that the proposed method outperforms the reference methods.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125410558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multisource DOA estimation based on time-frequency sparsity and joint inter-sensor data ratio with single acoustic vector sensor","authors":"Y. Zou, Wei Shi, Bo Li, C. Ritz, M. Shujau, J. Xi","doi":"10.1109/ICASSP.2013.6638412","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638412","url":null,"abstract":"By exploring the time-frequency (TF) sparsity property of the speech, the inter-sensor data ratios (ISDRs) of single acoustic vector sensor (AVS) have been derived and investigated. Under noiseless condition, ISDRs have favorable properties, such as being independent of frequency, DOA related with single valuedness, and no constraints on near or far field conditions. With these observations, we further investigated the behavior of ISDRs under noisy conditions and proposed a so-called ISDR-DOA estimation algorithm, where high local SNR data extraction and bivariate kernel density estimation techniques have been adopted to cluster the ISDRs representing the DOA information. Compared with the traditional DOA estimation methods with a small microphone array, the proposed algorithm has the merits of smaller size, no spatial aliasing and less computational cost. Simulation studies show that the proposed method with a single AVS can estimate up to seven sources simultaneously with high accuracy when the SNR is larger than 15dB. In addition, the DOA estimation results based on recorded data further validates the proposed algorithm.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126601797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhaojun Yang, A. Metallinou, Shrikanth S. Narayanan
{"title":"Toward body language generation in dyadic interaction settings from interlocutor multimodal cues","authors":"Zhaojun Yang, A. Metallinou, Shrikanth S. Narayanan","doi":"10.1109/ICASSP.2013.6638361","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638361","url":null,"abstract":"During dyadic interactions, participants influence each other's verbal and nonverbal behaviors. In this paper, we examine the coordination between a dyad's body language behavior, such as body motion, posture and relative orientation, given the participants' communication goals, e.g., friendly or conflictive, in improvised interactions. We further describe a Gaussian Mixture Model (GMM) based statistical methodology for automatically generating body language of a listener from speech and gesture cues of a speaker. The experimental results show that automatically generated body language trajectories generally follow the trends of observed trajectories, especially for velocities of body and arms, and that the use of speech information improves prediction performance. These results suggest that there is a significant level of predictability of body language in the examined goal-driven improvisations, which could be exploited for interaction-driven and goal-driven body language generation.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126831196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient algorithm for rational kernel evaluation in large lattice sets","authors":"J. Svec, P. Ircing","doi":"10.1109/ICASSP.2013.6638235","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638235","url":null,"abstract":"This paper presents an effective method for evaluation of the rational kernels represented by finite-state automata. The described algorithm is optimized for processing speed and thus facilitates the usage of state-of-the-art machine learning techniques like Support Vector Machines even in the real-time application of speech and language processing, such as dialogue systems and speech retrieval engines. The performance of the devised algorithm was tested on a spoken language understanding task and the results suggest that it consistently outperforms the baseline algorithm presented in the related literature.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126847150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On passive TDOA and FDOA localization using two sensors with no time or frequency synchronization","authors":"A. Yeredor","doi":"10.1109/ICASSP.2013.6638423","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638423","url":null,"abstract":"Traditional passive localization based on Time-Difference of Arrival (TDOA) or Frequency-Difference of Arrival (FDOA) usually involves several remote sensors, which require precise time-synchronization and frequency-locking among them. The need for such time or frequency alignment sometimes poses a serious operational challenge on the system. In addition, it is often desired to keep the number of sensors to a minimum. In this work we look into the operationally-simplest scenario in this context: using only two sensors, without any synchronization or locking. When at least one of the sensors, or the transmitting target, is moving at some considerable speed, it is still possible to localize the target, based on a few TDOA and / or FDOA measurements, by considering the time- and frequency-offsets as additional unknown parameters. We analyze the associated performance bound and propose a Maximum Likelihood estimation approach. The attainable accuracy and its dependence on geometry are demonstrated numerically and in simulation.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126927515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}