{"title":"Learning from high-dimensional noisy data via projections onto multi-dimensional ellipsoids","authors":"Liuling Gong, D. Schonfeld","doi":"10.1109/ICASSP.2010.5495284","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495284","url":null,"abstract":"In this paper, we examine the problem of learning from noise-contaminated data in high-dimensional space. A new learning approach based on projections onto multi-dimensional ellipsoids (POME) is introduced, which is applicable to unsupervised clustering, semi-supervised clustering and classification in high-dimensional noisy data. Unlike the traditional learning techniques, where local information is used for data analysis, the proposed POME-based scheme incorporates a priori information of the data distribution. Experimental results in unsupervised clustering demonstrate the superiority of the proposed POME-based scheme to some well-known clustering algorithms, including the k-means and the hierarchical agglomerative clustering. We also illustrate the effectiveness of our proposed POME-based scheme in semi-supervised learning by simulation.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121310299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rate-distortion performance analysis of an analog motion estimation array","authors":"L. Koskinen, J. Poikonen, M. Laiho, A. Paasio","doi":"10.1109/ICASSP.2010.5495511","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495511","url":null,"abstract":"Emerging 3D-integration enables integrating high quality image sensors with various massively parallel processing elements. Analog motion estimation is one potential application, which is likely to result in significant benefits in the form of low power or high frame-rate 3D-integrated image sensor-processors. The system-level operation of a proposed analog motion estimation array, enabling all various block sizes from 4×4 to 16×16 is examined. The analog motion estimation circuitry has been designed as a 32×32 test array in 0.13 µm CMOS technology. The transistor-level simulation results combined with H.264/AVC JM 14.2 show equivalent rate-distortion results with SAD as the error measure and an approximately 7% increase in bitrate with a slight increase in image quality for SSE.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114363484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Voice activity detection using harmonic frequency components in likelihood ratio test","authors":"L. Tan, B. J. Borgstrom, A. Alwan","doi":"10.1109/ICASSP.2010.5495611","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495611","url":null,"abstract":"This paper proposes a new statistical model-based likelihood ratio test (LRT) VAD to obtain reliable speech / non-speech decisions. In the proposed method, the likelihood ratio (LR) is calculated differently for voiced frames, as opposed to unvoiced frames: only DFT bins containing harmonic spectral peaks are selected for LR computation. To evaluate the new VAD's effectiveness in improving the noise-robustness of ASR, its decisions are applied to pre-processing techniques such as non-linear spectral subtraction, minimum mean square error short-time spectral amplitude estimator, and frame dropping. From the ASR experiments conducted on the Aurora2 database, the proposed harmonic frequency-based LRTs give better results than conventional LRT-based VADs and the standard G.729B and ETSI AMR VADs.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116329839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Maugey, Jérôme Gauthier, B. Pesquet-Popescu, C. Guillemot
{"title":"Using an exponential power model forwyner ziv video coding","authors":"Thomas Maugey, Jérôme Gauthier, B. Pesquet-Popescu, C. Guillemot","doi":"10.1109/ICASSP.2010.5496065","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5496065","url":null,"abstract":"The Laplacian model is the standard distribution for correlation noise estimation at the turbodecoder in Wyner-Ziv coding schemes. In practice, this hypothesis is not always satisfied and, regularly, the estimated model sensibly differs from the error distribution. In this work, we prove that using a model better fitted to the true distribution improves the performances, and we thus propose to use the more general exponential power distribution (EPD) which has never been tested in a distributed video coding context. Gains in rate-distortion over the Laplacian model are illustrated by results on several video sequences, showing that the EPD model outperforms the Laplacian one in off-line (oracle) as well as in on-line (practical implementation) modes. These results also indicate that, in some cases, the online EPD model reduces the bitrate even over the off-line Laplacian model.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121498488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient method to generate ground truth for evaluating lane detection systems","authors":"Amol Borkar, M. Hayes, Mark T. Smith","doi":"10.1109/ICASSP.2010.5495346","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495346","url":null,"abstract":"In this document, a new and efficient method to specify the ground truth locations of lane markers is presented. The method comprises of a novel process called Time-Slicing that provided the user with a unique visualization of the video. Coupled with automation via spline interpolation, the quick generation of necessary ground truth information is achieved. Videos recorded from a vehicle while driving on local city roads and highways are marked with ground truth information for use in testing. The performance of a variety of lane detection systems is compared to the ground truth and the error is computed for each system. Finally, quantitative analysis shows that the reference lane detection system presented in [1] produces the most accurate lane detections which is depicted by the smallest error.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121515152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new mode selection technique for coding Depth maps of 3D video","authors":"D. V. S. X. D. Silva, W. Fernando, H. K. Arachchi","doi":"10.1109/ICASSP.2010.5495093","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495093","url":null,"abstract":"Compression of Depth maps that are used in 3D video systems based on Depth Image Based Rendering (DIBR) poses a new challenge in video coding, since it is not a sequence of images for final viewing by end users rather an aid for rendering. Therefore, compressing depth maps using existing video coding techniques yields unacceptable distortions while rendering virtual views. In this paper we propose a novel mode selection method for offline compression of depth maps by selecting modes collaboratively considering an entire row of macroblocks together. For selecting these modes while encoding, we propose a novel distortion criteria that incorporates rendering distortions instead of distortion of depth map itself. A genetic algorithm based optimization technique is used for the mode selection. The simulation results suggest that the proposed technique can improve the PSNR up to 1.6dB in the rendered stereoscopic views in comparison to the block wise mode selection method based on Lagrange Optimization and the distortion of the depth map itself.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121548379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A signal-specific bound for joint tdoa and FDOA estimation and its Use in combining multiple segments","authors":"A. Yeredor","doi":"10.1109/ICASSP.2010.5495820","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495820","url":null,"abstract":"We consider passive joint estimation of the time-difference of arrival (TDOA) and frequency-difference of arrival (FDOA) of an unknown signal at two sensors. The classical approach for deriving the Cramér-Rao bound (CRB) in this context assumes that the signal (as well as the noise) is Gaussian and stationary. As a result, the obtained Fisher information matrix with respect to the TDOA and FDOA is diagonal, implying that the respective estimation errors are uncorrelated (under asymptotic conditions). However, for some specific (non-Gaussian, non-stationary) signals, especially chirp-like signals, these errors can be strongly correlated. In this work we derive a “signal-specific” (or a “conditional”) CRB for this problem: Modeling the signal as a deterministic unknown, we obtain a bound which, given any particular signal, can reflect the possible signal-induced correlation between the TDOA and FDOA estimates. We further demonstrate that this bound is instrumental for proper weighting when combining joint TDOA and FDOA estimates from independent intervals.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114707142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Randomized incremental protocols over adaptive networks","authors":"C. G. Lopes, A. H. Sayed","doi":"10.1109/ICASSP.2010.5495951","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495951","url":null,"abstract":"We introduce an incremental cooperation mode into the framework of adaptive networks (AN). The method applies to generic topologies and avoids the need to establish a Hamiltonian cycle over the network, generalizing the original incremental mode, while keeping nearly the same mean-square performance, as illustrated by the simulations. We motivate the new mode by relying on an LMS rule at the nodes, and mean-square analysis is provided.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114761062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A dual perspective on separable semidefinite programming with applications to optimal beamforming","authors":"Yongwei Huang, D. Palomar","doi":"10.1109/ICASSP.2010.5496110","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5496110","url":null,"abstract":"Consider the downlink beamforming optimization problem with signal-to-interference-plus-noise ratio constraints, null-shaping interference constraints and multiple groups of individual shaping constraints. We propose an efficient algorithm for the problem, which consists of firstly solving the dual of the semidefinite programm (SDP) relaxation, secondly formulating a linear program (LP) and solving it to find a rank-one solution of the SDP relaxation. In contrast to the existing algorithms, the analysis of the proposed algorithm includes neither the rank reduction steps (purification process) nor the Perron-Frobenius theorem.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114819004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Temporal motion smoothness measurement for reduced-reference video quality assessment","authors":"Kai Zeng, Zhou Wang","doi":"10.1109/ICASSP.2010.5495316","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495316","url":null,"abstract":"Reduced-reference (RR) video quality measures aim to predict the perceptual quality of distorted video signals using only partial information about the reference video. Existing RR video quality assessment models are mostly designed and/or trained for specific applications such as lossy compression, where the detectable distortion types are often fixed and limited. Here we propose a novel approach that measures temporal motion smoothness of a video sequence by examining the temporal variations of local phase structures in the complex wavelet transform domain. We show that the proposed measure can detect a wide range of well-known practical distortions, including noise contamination, blurring, line or frame jittering, and frame dropping. In addition, the proposed algorithm does not require a costly motion estimation process and has a low RR data rate, making it much easier to be adopted in real-world visual communication applications.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114853171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}