2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
Resolution enhancement for hyperspectral images: A super-resolution and fusion approach 高光谱图像的分辨率增强:一种超分辨率和融合方法
C. Kwan, J. H. Choi, Stanley H. Chan, Jin Zhou, Bence Budavari
{"title":"Resolution enhancement for hyperspectral images: A super-resolution and fusion approach","authors":"C. Kwan, J. H. Choi, Stanley H. Chan, Jin Zhou, Bence Budavari","doi":"10.1109/ICASSP.2017.7953344","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7953344","url":null,"abstract":"Many remote sensing applications require a high-resolution hyperspectral image. However, resolutions of most hyperspectral imagers are limited to tens of meters. Existing resolution enhancement techniques either acquire additional multispectral band images or use a pan band image. The former poses hardware challenges, whereas the latter has limited performance. In this paper, we present a new resolution enhancement method that only requires a color image. Our approach integrates two newly developed techniques in the area: (1) A hybrid color mapping algorithm, and (2) A Plug-and-Play algorithm for single image super-resolution. Comprehensive experiments using real hyperspectral images are conducted to validate and evaluate the proposed method.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121673230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
Statistics of natural fused image distortions 自然融合图像失真统计
D. E. Moreno-Villamarín, H. Benítez-Restrepo, A. Bovik
{"title":"Statistics of natural fused image distortions","authors":"D. E. Moreno-Villamarín, H. Benítez-Restrepo, A. Bovik","doi":"10.1109/ICASSP.2017.7952355","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952355","url":null,"abstract":"The capability to automatically evaluate the quality of long wave infrared (LWIR) and visible light images has the potential to play an important role in determining and controlling the quality of a resulting fused LWIR-visible image. Extensive work has been conducted on studying the statistics of natural LWIR and visible light images. Nonetheless, there has been little work done on analyzing the statistics of fused images and associated distortions. In this paper, we study the natural scene statistics (NSS) of fused images and how they are affected by several common types of distortions, including blur, white noise, JPEG compression, and non-uniformity (NU). Based on the results of a separate subjective study on the quality of pristine and degraded fused images, we propose an opinion-aware (OA) fused image quality analyzer, whose relative predictions with respect to other state-of-the-art metrics correlate better with human perceptual evaluations.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122400923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A multiple bandwidth objective speech intelligibility estimator based on articulation index band correlations and attention 基于清晰度指标、频带相关性和注意力的多带宽客观语音可理解度估计
S. Voran
{"title":"A multiple bandwidth objective speech intelligibility estimator based on articulation index band correlations and attention","authors":"S. Voran","doi":"10.1109/ICASSP.2017.7953128","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7953128","url":null,"abstract":"We present ABC-MRT16—a new algorithm for objective estimation of speech intelligibility following the Modified Rhyme Test (MRT) paradigm. ABC-MRT16 is simple, effective and robust. When compared to subjective MRT data from 367 diverse conditions that include coding, noise, frame erasures, and much more, ABC-MRT16 (containing just one optimized parameter) yields a very high Pearson correlation (above 0.95) and a remarkably low RMS estimation error (below 7% of full scale.) We attribute these successes to concise modeling of core human processes in audition and forced-choice word selection. On each trial, ABC-MRT16 gathers word selection evidence in the form of articulation index band correlations and then uses a simple attention model to perform word selection using the best available evidence. Attending to best evidence allows ABC-MRT16 to work well for narrowband, wideband, superwideband, and fullband speech and noise without any bandwidth detection algorithm or side information.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123009996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An LSTM-CTC based verification system for proxy-word based OOV keyword search 基于LSTM-CTC的基于代理词的OOV关键字搜索验证系统
Zhiqiang Lv, Jian Kang, Weiqiang Zhang, Jia Liu
{"title":"An LSTM-CTC based verification system for proxy-word based OOV keyword search","authors":"Zhiqiang Lv, Jian Kang, Weiqiang Zhang, Jia Liu","doi":"10.1109/ICASSP.2017.7953239","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7953239","url":null,"abstract":"Proxy-word based out of vocabulary (OOV) keyword search has been proven to be quite effective in keyword search. In proxy-word based OOV keyword search, each OOV keyword is assigned several proxies and detections of the proxies are regarded as detections of the OOV keywords. However, the confidence scores of these detections are still those of the proxies from lattices. To obtain a better confidence measure, we employ an LSTM-CTC verification method in this work and the confidence scores are regenerated. OOV keyword search results on the evalpart1 dataset of the OpenKWS16 Evaluation have shown consistent improvement and the maximum relative improvement can reach 21.06% for the MWTW metric.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"767 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117028378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Sparse eigenvectors of graphs 图的稀疏特征向量
Oguzhan Teke, P. Vaidyanathan
{"title":"Sparse eigenvectors of graphs","authors":"Oguzhan Teke, P. Vaidyanathan","doi":"10.1109/ICASSP.2017.7952888","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952888","url":null,"abstract":"In order to analyze signals defined over graphs, many concepts from the classical signal processing theory have been extended to the graph case. One of these concepts is the uncertainty principle, which studies the concentration of a signal on a graph and its graph Fourier basis (GFB). An eigenvector of a graph is the most localized signal in the GFB by definition, whereas it may not be localized in the vertex domain. However, if the eigenvector itself is sparse, then it is concentrated in both domains simultaneously. In this regard, this paper studies the necessary and sufficient conditions for the existence of 1, 2, and 3-sparse eigenvectors of the graph Laplacian. The provided conditions are purely algebraic and only use the adjacency information of the graph. Examples of both classical and real-world graphs with sparse eigenvectors are also presented.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124327026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Pairwise learning using multi-lingual bottleneck features for low-resource query-by-example spoken term detection 使用多语言瓶颈特征的两两学习,用于低资源按例查询的口语术语检测
Yougen Yuan, C. Leung, Lei Xie, Hongjie Chen, B. Ma, Haizhou Li
{"title":"Pairwise learning using multi-lingual bottleneck features for low-resource query-by-example spoken term detection","authors":"Yougen Yuan, C. Leung, Lei Xie, Hongjie Chen, B. Ma, Haizhou Li","doi":"10.1109/ICASSP.2017.7953237","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7953237","url":null,"abstract":"We propose to use a feature representation obtained by pairwise learning in a low-resource language for query-by-example spoken term detection (QbE-STD). We assume that word pairs identified by humans are available in the low-resource target language. The word pairs are parameterized by a multi-lingual bottleneck feature (BNF) extractor that is trained using transcribed data in high-resource languages. The multi-lingual BNFs of the word pairs are used as an initial feature representation to train an autoencoder (AE). We extract features from an internal hidden layer of the pairwise trained AE to perform acoustic pattern matching for QbE-STD. Our experiments on the TIMIT and Switchboard corpora show that the pairwise learning brings 7.61% and 8.75% relative improvements in mean average precision (MAP) respectively over the initial feature representation.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114901563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
A non-intrusive Short-Time Objective Intelligibility measure 一种非侵入性的短期客观可理解性测量
A. H. Andersen, Jan Mark de Haan, Z. Tan, J. Jensen
{"title":"A non-intrusive Short-Time Objective Intelligibility measure","authors":"A. H. Andersen, Jan Mark de Haan, Z. Tan, J. Jensen","doi":"10.1109/ICASSP.2017.7953125","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7953125","url":null,"abstract":"We propose a non-intrusive intelligibility measure for noisy and non-linearly processed speech, i.e. a measure which can predict intelligibility from a degraded speech signal without requiring a clean reference signal. The proposed measure is based on the Short-Time Objective Intelligibility (STOI) measure. In particular, the non-intrusive STOI measure estimates clean signal amplitude envelopes from the degraded signal. Subsequently, the STOI measure is evaluated by use of the envelopes of the degraded signal and the estimated clean envelopes. The performance of the proposed measure is evaluated on a dataset including speech in different noise types, processed with binary masks. The measure is shown to predict intelligibility well in all tested conditions, with the exception of those including a single competing speaker. While the measure does not perform as well as the original (intrusive) STOI measure, it is shown to outperform existing non-intrusive measures.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125029761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Unsupervised learning of asymmetric high-order autoregressive stochastic volatility model 非对称高阶自回归随机波动模型的无监督学习
I. Gorynin, E. Monfrini, W. Pieczynski
{"title":"Unsupervised learning of asymmetric high-order autoregressive stochastic volatility model","authors":"I. Gorynin, E. Monfrini, W. Pieczynski","doi":"10.1109/ICASSP.2017.7953064","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7953064","url":null,"abstract":"The object of this paper is to introduce a new estimation algorithm specifically designed for the latent high-order autoregressive models. It implements the concept of the filter-based maximum likelihood. Our approach is fully deterministic and is less computationally demanding than the traditional Monte Carlo Markov chain techniques. The simulation experiments and real-world data processing confirm the interest of our approach.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116496682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-separable quadruple lifting structure for four-dimensional integer Wavelet Transform with reduced rounding noise 四维整数小波变换中不可分离的四重提升结构,降低了舍入噪声
Fairoza Amira Hamzah, Taichi Yoshida, M. Iwahashi
{"title":"Non-separable quadruple lifting structure for four-dimensional integer Wavelet Transform with reduced rounding noise","authors":"Fairoza Amira Hamzah, Taichi Yoshida, M. Iwahashi","doi":"10.1109/ICASSP.2017.7952336","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952336","url":null,"abstract":"The Wavelet Transform (WT) in JPEG 2000 is using a ‘separable’ lifting structure, where the one-dimensional (1D) transform is put into multidimensional image signal of its spatial and temporal dimensions. A ‘non-separable’ three-dimensional (3D) structure as the existing method is used to minimize its lifting steps. The ‘non-separable’ 3D structure in the (5,3) type transform for lossless coding is proved to reduce the rounding noise inside it. However, in the (9,7) type transform for lossy coding, the rounding noise inside the ‘non-separable’ 3D structure has increased. This paper proposed a new ‘non-separable’ two-dimensional (2D) structure for integer implementation of a four-dimensional (4D) quadruple lifting WT. Since the order of the original lifting step is preserved, the total amount of the rounding noise observed in pixel values of the decoded image is significantly reduced, and the lossy coding performance for 4D input signal is increased.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124732196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Improved cepstra minimum-mean-square-error noise reduction algorithm for robust speech recognition 鲁棒语音识别的改进倒频谱最小均方误差降噪算法
Jinyu Li, Yan Huang, Y. Gong
{"title":"Improved cepstra minimum-mean-square-error noise reduction algorithm for robust speech recognition","authors":"Jinyu Li, Yan Huang, Y. Gong","doi":"10.1109/ICASSP.2017.7953081","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7953081","url":null,"abstract":"In the era of deep learning, although beam-forming multi-channel signal processing is still very helpful, it was reported that single-channel robust front-ends usually cannot benefit deep learning models because the layer-by-layer structure of deep learning models provides a feature extraction strategy that automatically derives powerful noise-resistant features from primitive raw data for senone classification. In this study, we show that the single-channel robust front-end is still very beneficial to deep learning modelling as long as it is well designed. We improve a robust front-end, cepstra minimum mean square error (CMMSE), by using more reliable voice activity detector, refined prior SNR estimation, better gain smoothing and two-stage processing. This new front-end, improved CMMSE (ICMMSE), is evaluated on the standard Aurora 2 and Chime 3 tasks, and a 3400 hour Microsoft Cortana digital assistant task using Gaussian mixture models, feed-forward deep neural networks, and long short-term memory recurrent neural networks, respectively. It is shown that ICMMSE is superior regardless of the underlying acoustic models and the scale of evaluation tasks, with 25.46% relative WER reduction on Aurora 2, up to 11.98% relative WER reduction on Chime 3, and up to 11.01% relative WER reduction on Cortana digital assistant task, respectively.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122525172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信