B. K. Khonglah, S. Madikeri, S. Dey, H. Bourlard, P. Motlícek, J. Billa
{"title":"Incremental Semi-Supervised Learning for Multi-Genre Speech Recognition","authors":"B. K. Khonglah, S. Madikeri, S. Dey, H. Bourlard, P. Motlícek, J. Billa","doi":"10.1109/ICASSP40776.2020.9054309","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054309","url":null,"abstract":"In this work, we explore a data scheduling strategy for semi-supervised learning (SSL) for acoustic modeling in automatic speech recognition. The conventional approach uses a seed model trained with supervised data to automatically recognize the entire set of unlabeled (auxiliary) data to generate new labels for subsequent acoustic model training. In this paper, we propose an approach in which the unlabelled set is divided into multiple equal-sized subsets. These subsets are processed in an incremental fashion: for each iteration a new subset is added to the data used for SSL, starting from only one subset in the first iteration. The acoustic model from the previous iteration becomes the seed model for the next one. This scheduling strategy is compared to the approach employing all unlabeled data in one-shot for training. Experiments using lattice-free maximum mutual information based acoustic model training on Fisher English gives 80% word error recovery rate. On the multi-genre evaluation sets on Lithuanian and Bulgarian relative improvements of up to 17.2% in word error rate are observed.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"332 1","pages":"7419-7423"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76584036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Epileptic Seizure Onset-Offset Detection Based On CNN in Scalp EEG","authors":"P. Boonyakitanont, Apiwat Lek-uthai, J. Songsiri","doi":"10.1109/ICASSP40776.2020.9053143","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053143","url":null,"abstract":"We establish a deep learning-based method to automatically detect the epileptic seizure onsets and offsets in multi-channel electroencephalography (EEG) signals. A convolutional neural network (CNN) is designed to identify occurrences of seizures in EEG epochs from the EEG signals and an onset-offset detector is proposed to determine the seizure onsets and offsets. The EEG signals are considered as inputs and the outputs are the onset and offset. In the CNN, a filter is factorized to separately capture temporal and spatial patterns in EEG epochs. Moreover, we develop an onset-offset detection method based on clinical decision criteria. As a result, verified on the whole CHB-MIT Scalp EEG database, the CNN model correctly detected seizure activities over 90%. Furthermore, combined with the onset-offset detector, this method accomplished F1 of 64.40% and essentially determined the seizure onset and offset with absolute onset and offset latencies of 5.83 and 10.12 seconds, respectively.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"1225-1229"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77397373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alankar Kowtal, A. Cramer, Dufan Wu, Kai Yang, Wolfgang Krull, Ioannis Gkioulekas, Rajiv Gupta
{"title":"Signal Sensing and Reconstruction Paradigms for a Novel Multi-Source Static Computed Tomography System","authors":"Alankar Kowtal, A. Cramer, Dufan Wu, Kai Yang, Wolfgang Krull, Ioannis Gkioulekas, Rajiv Gupta","doi":"10.1109/ICASSP40776.2020.9054146","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054146","url":null,"abstract":"Conventional Computed Tomography (CT) systems use a single X-ray source and an arc of detectors mounted on a rotating gantry to acquire a set of projection data. Novel CT systems are now being pioneered in which a complete ring of distributed X-ray sources and detectors are electronically turned on and off, without any mechanical motion, to acquire a set of projections for tomographic reconstruction. This paper discusses new sensing and reconstruction paradigms enabled by this new CT architecture.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"13 1","pages":"9274-9278"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77689864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Alexandropoulos, Evangelos Vlachos, J. Thompson
{"title":"Wideband Channel Tracking for Millimeter Wave Massive Mimo Systems with Hybrid Beamforming Reception","authors":"G. Alexandropoulos, Evangelos Vlachos, J. Thompson","doi":"10.1109/ICASSP40776.2020.9053440","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053440","url":null,"abstract":"Millimeter Wave (mmWave) massive Multiple Input Multiple Output (MIMO) channel tracking is a challenging task with Hybrid analog and digital BeamForming (HBF) reception architectures. The wireless channel can only be spatially sampled with directive analog beams, which results in lengthy training periods when beam codebooks are large. In this paper, we capitalize on a recently proposed HBF architecture enabling mmWave massive MIMO channel estimation with short beam training overhead, and present a matrix-completion-based channel tracking technique for time correlated HBF receivers. The considered channel tracking problem is formulated as a constrained multi-objective optimization problem incorporating the low rank and group-sparse properties of the mmWave channel as well as a popular model for its time correlation. We present an efficient algorithm for this estimation problem that is based on the alternating direction method of multipliers. Comparisons of the proposed approach over representative state-of-the-art techniques showcase the relation between the channel time correlation coefficient and the amount of beam training needed for acceptable channel estimation performance.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"16 1","pages":"8698-8702"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77710074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fixed-Point Optimization of Transformer Neural Network","authors":"Yoonho Boo, Wonyong Sung","doi":"10.1109/ICASSP40776.2020.9054724","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054724","url":null,"abstract":"The Transformer model adopts a self-attention structure and shows very good performance in various natural language processing tasks. However, it is difficult to implement the Transformer in embedded systems because of its very large model size. In this study, we quantize the parameters and hidden signals of the Transformer for complexity reduction. Not only matrices for weights and embedding but the input and the softmax outputs are also quantized to utilize low-precision matrix multiplication. The fixed-point optimization steps consist of quantization sensitivity analysis, hardware conscious word-length assignment, quantization and retraining, and post-training for improved generalization. We achieved 27.51 BLEU score on the WMT English-to-German translation task with 4-bit weights and 6-bit hidden signals.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"85 1","pages":"1753-1757"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79833383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fully-Neural Approach to Heavy Vehicle Detection on Bridges Using a Single Strain Sensor","authors":"T. Kawakatsu, K. Aihara, A. Takasu, J. Adachi","doi":"10.1109/ICASSP40776.2020.9053137","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053137","url":null,"abstract":"Bridge weigh-in-motion (BWIM) is a technique for detecting heavy vehicles that may cause serious damage to real bridges. BWIM is realized by analyzing the strain signals observed at places on the bridge in terms of bridge-component responses to the axle loads. In current practice, a BWIM system requires multiple strain sensors to collect vehicle properties including speed and axle positions for accurate load estimation, which may limit the system’s life-span. Furthermore, BWIM should consider a wide variety of waveforms, which may be caused by vehicle acceleration and/or the various traveling positions in lanes. In this paper, we propose a novel BWIM mechanism, which employs a deep convolutional neural network (CNN). The CNN is able to learn actual traffic conditions and achieve accurate load estimation by using only a single strain sensor. The training dataset is collected from a distant load meter, by consulting traffic surveillance cameras and identifying similar vehicles. After the system initialization, the CNN requires no additional sensors (or cameras) for axle detection, which may reduce the costs of both installation and system maintenance.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"45 1","pages":"3047-3051"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80076988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhanced Mixture Population Monte Carlo Via Stochastic Optimization and Markov Chain Monte Carlo Sampling","authors":"Yousef El-Laham, P. Djurić, M. Bugallo","doi":"10.1109/ICASSP40776.2020.9053410","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053410","url":null,"abstract":"The population Monte Carlo (PMC) algorithm is a popular adaptive importance sampling (AIS) method used for approximate computation of intractable integrals. Over the years, many advances have been made in the theory and implementation of PMC schemes. The mixture PMC (M-PMC) algorithm, for instance, optimizes the parameters of a mixture proposal distribution in a way that minimizes that Kullback-Leibler divergence to the target distribution. The parameters in M-PMC are updated using a single step of expectation maximization (EM), which limits its accuracy. In this work, we introduce a novel M-PMC algorithm that optimizes the parameters of a mixture proposal distribution, where parameter updates are resolved via stochastic optimization instead of EM. The stochastic gradients w.r.t. each of the mixture parameters are approximated using a population of Markov chain Monte Carlo samplers. We validate the proposed scheme via numerical simulations on an example where the considered target distribution is multimodal.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"48 1","pages":"5475-5479"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80291874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Fast Proximal Point Algorithm for Generalized Graph Laplacian Learning","authors":"Zengde Deng, A. M. So","doi":"10.1109/ICASSP40776.2020.9054185","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054185","url":null,"abstract":"Graph learning is one of the most important tasks in machine learning, statistics and signal processing. In this paper, we focus on the problem of learning the generalized graph Lapla-cian (GGL) and propose an efficient algorithm to solve it. We first fully exploit the sparsity structure hidden in the objective function by utilizing soft-thresholding technique to transform the GGL problem into an equivalent problem. Moreover, we propose a fast proximal point algorithm (PPA) to solve the transformed GGL problem and establish the linear convergence rate of our algorithm. Extensive numerical experiments on both synthetic data and real data demonstrate that the soft-thresholding technique accelerates our PPA method and PPA can outperform the current state-of-the-art method in terms of speed.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"13 1 1","pages":"5425-5429"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80404967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhanced Non-Local Cascading Network with Attention Mechanism for Hyperspectral Image Denoising","authors":"Hanwen Ma, Ganchao Liu, Yuan Yuan","doi":"10.1109/ICASSP40776.2020.9054630","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054630","url":null,"abstract":"Because of the complexity of imaging environment, hyper-spectral remote sensing images (HSIs) often suffer from different kinds of noise. Despite the success in natural image denoising, most of the existing CNN-based HSIs denoising methods still suffer from the problem of inadequate noise suppression and insufficient feature extraction. In this paper, a novel HSIs denoising algorithm based on an enhanced non-local cascading network with attention mechanism (ENCAM) is proposed, which can extract the joint spatial-spectral feature more effectively. The main contributions include: (1) the non-local structure is introduced to enlarge the receptive field to extract the spatial features more effectively; (2) multi-scale convolutions and channel attention module are applied to enhance extracted multi-scale features; (3) a cascading residual dense structure is used to extract different frequency features. Both of the theoretical analysis and the experiments indicate that the proposed method is superior to the other state-of-the-art methods on HSIs denoising.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"6 1","pages":"2448-2452"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79116148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Chen, S. Kwong, Mingliang Zhou, Shiqi Wang, Guopu Zhu, Yi Wang
{"title":"Intra Frame Rate Control for Versatile Video Coding with Quadratic Rate-Distortion Modelling","authors":"Yi Chen, S. Kwong, Mingliang Zhou, Shiqi Wang, Guopu Zhu, Yi Wang","doi":"10.1109/ICASSP40776.2020.9054633","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054633","url":null,"abstract":"With numerous coding tools adopted in the forthcoming Versatile Video Coding (VVC) standard, much less work has been dedicated to study the corresponding Rate-Distortion (R-D) characteristics. This paper proposes a new quadratic R-D model for Versatile Video Coding. In particular, based on the proposed model, a new R-λ relationship is derived and used for frame level rate control. The rate control algorithm is implemented on VTM 2.0 platform for intra coding scenarios. Compared to the default rate control algorithm in VTM 2.0, experimental results show that proposed rate control algorithm can achieve 0.77% BD-BR reduction with similar control accuracy.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"305 1","pages":"4422-4426"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79341100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}