C. Kwan, J. H. Choi, Stanley H. Chan, Jin Zhou, Bence Budavari
{"title":"Resolution enhancement for hyperspectral images: A super-resolution and fusion approach","authors":"C. Kwan, J. H. Choi, Stanley H. Chan, Jin Zhou, Bence Budavari","doi":"10.1109/ICASSP.2017.7953344","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7953344","url":null,"abstract":"Many remote sensing applications require a high-resolution hyperspectral image. However, resolutions of most hyperspectral imagers are limited to tens of meters. Existing resolution enhancement techniques either acquire additional multispectral band images or use a pan band image. The former poses hardware challenges, whereas the latter has limited performance. In this paper, we present a new resolution enhancement method that only requires a color image. Our approach integrates two newly developed techniques in the area: (1) A hybrid color mapping algorithm, and (2) A Plug-and-Play algorithm for single image super-resolution. Comprehensive experiments using real hyperspectral images are conducted to validate and evaluate the proposed method.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121673230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. E. Moreno-Villamarín, H. Benítez-Restrepo, A. Bovik
{"title":"Statistics of natural fused image distortions","authors":"D. E. Moreno-Villamarín, H. Benítez-Restrepo, A. Bovik","doi":"10.1109/ICASSP.2017.7952355","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952355","url":null,"abstract":"The capability to automatically evaluate the quality of long wave infrared (LWIR) and visible light images has the potential to play an important role in determining and controlling the quality of a resulting fused LWIR-visible image. Extensive work has been conducted on studying the statistics of natural LWIR and visible light images. Nonetheless, there has been little work done on analyzing the statistics of fused images and associated distortions. In this paper, we study the natural scene statistics (NSS) of fused images and how they are affected by several common types of distortions, including blur, white noise, JPEG compression, and non-uniformity (NU). Based on the results of a separate subjective study on the quality of pristine and degraded fused images, we propose an opinion-aware (OA) fused image quality analyzer, whose relative predictions with respect to other state-of-the-art metrics correlate better with human perceptual evaluations.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122400923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A multiple bandwidth objective speech intelligibility estimator based on articulation index band correlations and attention","authors":"S. Voran","doi":"10.1109/ICASSP.2017.7953128","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7953128","url":null,"abstract":"We present ABC-MRT16—a new algorithm for objective estimation of speech intelligibility following the Modified Rhyme Test (MRT) paradigm. ABC-MRT16 is simple, effective and robust. When compared to subjective MRT data from 367 diverse conditions that include coding, noise, frame erasures, and much more, ABC-MRT16 (containing just one optimized parameter) yields a very high Pearson correlation (above 0.95) and a remarkably low RMS estimation error (below 7% of full scale.) We attribute these successes to concise modeling of core human processes in audition and forced-choice word selection. On each trial, ABC-MRT16 gathers word selection evidence in the form of articulation index band correlations and then uses a simple attention model to perform word selection using the best available evidence. Attending to best evidence allows ABC-MRT16 to work well for narrowband, wideband, superwideband, and fullband speech and noise without any bandwidth detection algorithm or side information.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123009996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An LSTM-CTC based verification system for proxy-word based OOV keyword search","authors":"Zhiqiang Lv, Jian Kang, Weiqiang Zhang, Jia Liu","doi":"10.1109/ICASSP.2017.7953239","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7953239","url":null,"abstract":"Proxy-word based out of vocabulary (OOV) keyword search has been proven to be quite effective in keyword search. In proxy-word based OOV keyword search, each OOV keyword is assigned several proxies and detections of the proxies are regarded as detections of the OOV keywords. However, the confidence scores of these detections are still those of the proxies from lattices. To obtain a better confidence measure, we employ an LSTM-CTC verification method in this work and the confidence scores are regenerated. OOV keyword search results on the evalpart1 dataset of the OpenKWS16 Evaluation have shown consistent improvement and the maximum relative improvement can reach 21.06% for the MWTW metric.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"767 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117028378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sparse eigenvectors of graphs","authors":"Oguzhan Teke, P. Vaidyanathan","doi":"10.1109/ICASSP.2017.7952888","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952888","url":null,"abstract":"In order to analyze signals defined over graphs, many concepts from the classical signal processing theory have been extended to the graph case. One of these concepts is the uncertainty principle, which studies the concentration of a signal on a graph and its graph Fourier basis (GFB). An eigenvector of a graph is the most localized signal in the GFB by definition, whereas it may not be localized in the vertex domain. However, if the eigenvector itself is sparse, then it is concentrated in both domains simultaneously. In this regard, this paper studies the necessary and sufficient conditions for the existence of 1, 2, and 3-sparse eigenvectors of the graph Laplacian. The provided conditions are purely algebraic and only use the adjacency information of the graph. Examples of both classical and real-world graphs with sparse eigenvectors are also presented.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124327026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yougen Yuan, C. Leung, Lei Xie, Hongjie Chen, B. Ma, Haizhou Li
{"title":"Pairwise learning using multi-lingual bottleneck features for low-resource query-by-example spoken term detection","authors":"Yougen Yuan, C. Leung, Lei Xie, Hongjie Chen, B. Ma, Haizhou Li","doi":"10.1109/ICASSP.2017.7953237","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7953237","url":null,"abstract":"We propose to use a feature representation obtained by pairwise learning in a low-resource language for query-by-example spoken term detection (QbE-STD). We assume that word pairs identified by humans are available in the low-resource target language. The word pairs are parameterized by a multi-lingual bottleneck feature (BNF) extractor that is trained using transcribed data in high-resource languages. The multi-lingual BNFs of the word pairs are used as an initial feature representation to train an autoencoder (AE). We extract features from an internal hidden layer of the pairwise trained AE to perform acoustic pattern matching for QbE-STD. Our experiments on the TIMIT and Switchboard corpora show that the pairwise learning brings 7.61% and 8.75% relative improvements in mean average precision (MAP) respectively over the initial feature representation.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114901563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. H. Andersen, Jan Mark de Haan, Z. Tan, J. Jensen
{"title":"A non-intrusive Short-Time Objective Intelligibility measure","authors":"A. H. Andersen, Jan Mark de Haan, Z. Tan, J. Jensen","doi":"10.1109/ICASSP.2017.7953125","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7953125","url":null,"abstract":"We propose a non-intrusive intelligibility measure for noisy and non-linearly processed speech, i.e. a measure which can predict intelligibility from a degraded speech signal without requiring a clean reference signal. The proposed measure is based on the Short-Time Objective Intelligibility (STOI) measure. In particular, the non-intrusive STOI measure estimates clean signal amplitude envelopes from the degraded signal. Subsequently, the STOI measure is evaluated by use of the envelopes of the degraded signal and the estimated clean envelopes. The performance of the proposed measure is evaluated on a dataset including speech in different noise types, processed with binary masks. The measure is shown to predict intelligibility well in all tested conditions, with the exception of those including a single competing speaker. While the measure does not perform as well as the original (intrusive) STOI measure, it is shown to outperform existing non-intrusive measures.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125029761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised learning of asymmetric high-order autoregressive stochastic volatility model","authors":"I. Gorynin, E. Monfrini, W. Pieczynski","doi":"10.1109/ICASSP.2017.7953064","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7953064","url":null,"abstract":"The object of this paper is to introduce a new estimation algorithm specifically designed for the latent high-order autoregressive models. It implements the concept of the filter-based maximum likelihood. Our approach is fully deterministic and is less computationally demanding than the traditional Monte Carlo Markov chain techniques. The simulation experiments and real-world data processing confirm the interest of our approach.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116496682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Non-separable quadruple lifting structure for four-dimensional integer Wavelet Transform with reduced rounding noise","authors":"Fairoza Amira Hamzah, Taichi Yoshida, M. Iwahashi","doi":"10.1109/ICASSP.2017.7952336","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952336","url":null,"abstract":"The Wavelet Transform (WT) in JPEG 2000 is using a ‘separable’ lifting structure, where the one-dimensional (1D) transform is put into multidimensional image signal of its spatial and temporal dimensions. A ‘non-separable’ three-dimensional (3D) structure as the existing method is used to minimize its lifting steps. The ‘non-separable’ 3D structure in the (5,3) type transform for lossless coding is proved to reduce the rounding noise inside it. However, in the (9,7) type transform for lossy coding, the rounding noise inside the ‘non-separable’ 3D structure has increased. This paper proposed a new ‘non-separable’ two-dimensional (2D) structure for integer implementation of a four-dimensional (4D) quadruple lifting WT. Since the order of the original lifting step is preserved, the total amount of the rounding noise observed in pixel values of the decoded image is significantly reduced, and the lossy coding performance for 4D input signal is increased.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124732196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved cepstra minimum-mean-square-error noise reduction algorithm for robust speech recognition","authors":"Jinyu Li, Yan Huang, Y. Gong","doi":"10.1109/ICASSP.2017.7953081","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7953081","url":null,"abstract":"In the era of deep learning, although beam-forming multi-channel signal processing is still very helpful, it was reported that single-channel robust front-ends usually cannot benefit deep learning models because the layer-by-layer structure of deep learning models provides a feature extraction strategy that automatically derives powerful noise-resistant features from primitive raw data for senone classification. In this study, we show that the single-channel robust front-end is still very beneficial to deep learning modelling as long as it is well designed. We improve a robust front-end, cepstra minimum mean square error (CMMSE), by using more reliable voice activity detector, refined prior SNR estimation, better gain smoothing and two-stage processing. This new front-end, improved CMMSE (ICMMSE), is evaluated on the standard Aurora 2 and Chime 3 tasks, and a 3400 hour Microsoft Cortana digital assistant task using Gaussian mixture models, feed-forward deep neural networks, and long short-term memory recurrent neural networks, respectively. It is shown that ICMMSE is superior regardless of the underlying acoustic models and the scale of evaluation tasks, with 25.46% relative WER reduction on Aurora 2, up to 11.98% relative WER reduction on Chime 3, and up to 11.01% relative WER reduction on Cortana digital assistant task, respectively.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122525172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}