{"title":"Preconditioned Ghost Imaging Via Sparsity Constraint","authors":"Zhishen Tong, Jian Wang, Shensheng Han","doi":"10.1109/ICASSP40776.2020.9053414","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053414","url":null,"abstract":"Ghost imaging via sparsity constraint (GISC) can recover objects from the intensity fluctuation of light fields at a sampling rate far below the Nyquist rate. However, its imaging quality may degrade severely when the coherence of sampling matrices is large. To deal with this issue, we propose an efficient recovery algorithm for GISC called the preconditioned multiple orthogonal least squares (PmOLS). Our algorithm consists of two major parts: i) the pseudo-inverse preconditioning (PIP) method refining the coherence of sampling matrices and ii) the multiple orthogonal least squares (mOLS) algorithm recovering the objects. Theoretical analysis shows that PmOLS recovers any n-dimensional K-sparse signal from m random linear samples of the signal with probability exceeding $1 - 3{n^2}{e^{ - cm/{K^2}}}$. Simulations and experiments demonstrate that PmOLS has competitive imaging quality compared to the state-of-the-art approaches.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"246 1","pages":"1484-1488"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79349791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Indurthi, HyoJung Han, Nikhil Kumar Lakumarapu, Beomseok Lee, Insoo Chung, Sangha Kim, Chanwoo Kim
{"title":"End-end Speech-to-Text Translation with Modality Agnostic Meta-Learning","authors":"S. Indurthi, HyoJung Han, Nikhil Kumar Lakumarapu, Beomseok Lee, Insoo Chung, Sangha Kim, Chanwoo Kim","doi":"10.1109/ICASSP40776.2020.9054759","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054759","url":null,"abstract":"Collecting large amounts of data to train end-to-end Speech Translation (ST) models is more difficult compared to the ASR and MT tasks. Previous studies have proposed the use of transfer learning approaches to overcome the above difficulty. These approaches benefit from weakly supervised training data, such as ASR speech-to-transcript or MT text-to-text translation pairs. However, the parameters in these models are updated independently of each task, which may lead to sub-optimal solutions. In this work, we adopt a meta-learning algorithm to train a modality agnostic multi-task model that transfers knowledge from source tasks=ASR+MT to target task=ST where the ST task severely lacks data. In the meta-learning phase, parameters are updated in such a way that they act as a good ini-tialization point for the target ST task. We evaluate the proposed meta-learning approach for ST tasks on English-German (En-De) and English-French (En-Fr) language pairs from the Multilingual Speech Translation Corpus (MuST-C). Our method outperforms the previous transfer learning approaches and sets new state-of-the-art results for En-De and En-Fr ST tasks by obtaining 9.18, and 11.76 BLEU point improvements, respectively.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"145 6 1","pages":"7904-7908"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79392323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liu Yang, Hechuan Wang, Yousef El-Laham, J. Fonte, David Trillo Pérez, M. Bugallo
{"title":"Indoor Altitude Estimation of Unmanned Aerial Vehicles Using a Bank of Kalman Filters","authors":"Liu Yang, Hechuan Wang, Yousef El-Laham, J. Fonte, David Trillo Pérez, M. Bugallo","doi":"10.1109/ICASSP40776.2020.9054203","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054203","url":null,"abstract":"Altitude estimation is important for successful control and navigation of unmanned aerial vehicles (UAVs). UAVs do not have indoor access to GPS signals and can only use on-board sensors for reliable estimation of altitude. Unfortunately, most existing navigation schemes are not robust to the presence of abnormal obstructions above and below the UAV. In this work, we propose a novel strategy for tackling the altitude estimation problem that utilizes multiple model adaptive estimation (MMAE), where the candidate models correspond to four scenarios: no obstacles above and below the UAV; obstacles above the UAV; obstacles below the UAV; and obstacles above and below the UAV. The principle of Occam’s razor ensures that the model that offers the most parsimonious explanation of the sensor data has the most influence in the MMAE algorithm. We validate the proposed scheme on synthetic and real sensor data.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"15 4-5 1","pages":"5455-5459"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84548241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ibón Guillén, Xiaochun Liu, A. Velten, D. Gutierrez, A. Jarabo
{"title":"On the Effect of Reflectance on Phasor Field Non-Line-of-Sight Imaging","authors":"Ibón Guillén, Xiaochun Liu, A. Velten, D. Gutierrez, A. Jarabo","doi":"10.1109/ICASSP40776.2020.9052985","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9052985","url":null,"abstract":"Non-line-of-sight (NLOS) imaging aims to visualize occluded scenes by exploiting indirect reflections on visible surfaces. Previous methods approach this problem by inverting the light transport on the hidden scene, but are limited to isolated, diffuse objects. The recently introduced phasor fields framework computationally poses NLOS reconstruction as a virtual line-of-sight (LOS) problem, lifting most assumptions about the hidden scene. In this work we complement recent theoretical analysis of phasor field-based reconstruction, by empirically analyzing the effect of reflectance of the hidden scenes on reconstruction. We experimentally study the reconstruction of hidden scenes composed of objects with increasingly specular materials. Then, we evaluate the effect of the virtual aperture size on the reconstruction, and establish connections between the effect of these two different dimensions on the results. We hope our analysis helps to characterize the imaging capabilities of this promising new framework, and foster new NLOS imaging modalities.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"9269-9273"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84888628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DGAN: Disentangled Representation Learning for Anisotropic BRDF Reconstruction","authors":"Zhongyun Hu, Xue Wang, Qing Wang","doi":"10.1109/ICASSP40776.2020.9054095","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054095","url":null,"abstract":"Accurate reconstruction of real-world materials’ appearance from a very limited number of samples is still a huge challenge in computer vision and graphics. In this paper, we present a novel deep architecture, Disentangled Generative Adversarial Network (DGAN), which performs anisotropic Bidirectional Reflectance Distribution Function (BRDF) reconstruction from single BRDF subspace with the maximum entropy. In contrast to previous approaches that directly map known samples to a full BRDF using a CNN, a disentangled representation learning is applied to guide the reconstruction process. In order to learn different physical factors of the BRDF, the generator of the DGAN mainly consists of a fresnel estimator module (FEM) and a directional module (DM). Considering the fact that the entropy of different BRDF subspace varies, we further divide the BRDF into He-BRDF and Le-BRDF to reconstruct the interior part and the exterior part of the directional factor. Experimental results show that our approach outperforms state-of-the-art methods.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"4397-4401"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84913573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Global Optimized Affine Registration Method for Microscopic Images of Biological Tissue","authors":"Yanan Lv, Xi Chen, Chang Shu, Hua Han","doi":"10.1109/ICASSP40776.2020.9054568","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054568","url":null,"abstract":"Affine registration can fit the non-rigid deformation of slices effectively, and it is widely used in volume reconstruction of biological tissue. But most of the existing affine registration methods are registered in a given sequence, which results in the accumulation of errors. In this paper, a global optimized affine registration method is proposed, which can be used in volume reconstruction. To eliminate the cumulative error, the affine transformation of all images is estimated simultaneously based on an energy function. A soft penalty on affine transformation is added to restrict the shearing of images. Experiments show that our method provides a more reliable registration result compared with sequential affine registration. It can solve the problems caused by the accumulation of errors. The registration result fits the deformation of slices well and preserves the rigidity of images.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"12 1","pages":"1070-1074"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84935246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Time-Frequency Loss for CNN Based Speech Super-Resolution","authors":"Heming Wang, Deliang Wang","doi":"10.1109/ICASSP40776.2020.9053712","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053712","url":null,"abstract":"Speech super-resolution (SR), also called speech bandwidth extension (BWE), aims to increase the sampling rate of a given lower resolution speech signal. Recent years have witnessed the successful application of deep neural networks in time or frequency domains, and deep learning has improved the performance considerably compared with conventional approaches. This paper proposes an autoencoder based fully convolutional neural network (CNN) that merges the information from both time and frequency domains. At the training time, we optimize the CNN using a new time-frequency loss (T-F loss), which combines a time domain loss and a frequency domain loss. The experimental results show that our model trained with the T-F loss achieves significantly better results than other state-of-the-art models, and yields balanced performance in terms of time and frequency metrics.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"861-865"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84980801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Majed Saad, F. Bader, A. Ghouwayel, Hussein Hijazi, Nizar Bouhel, J. Palicot
{"title":"Generalized Spatial Modulation for Wireless Terabits Systems Under Sub-THZ Channel With RF Impairments","authors":"Majed Saad, F. Bader, A. Ghouwayel, Hussein Hijazi, Nizar Bouhel, J. Palicot","doi":"10.1109/ICASSP40776.2020.9053208","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053208","url":null,"abstract":"Multiple-Input Multiple-Output (MIMO) technique with Index Modulation (IM) over sub-TeraHertz (sub-THz) bands represent a promising solution to design new wireless ultrahigh data rate systems. However, the system design over sub-THz bands suffers from many technological limitations and severe RF-impairments such as low output power, limited resolution of high-speed low-power Analog-to-Digital Converters and important Phase Noise (PN) introduced by the Local Oscillator (LO). In this paper, different modulations schemes with Generalized Spatial Modulation (GSM) are compared from different perspectives while considering the sub-THz impairments. The effect of PN has been investigated for these modulation schemes in sub-THz channels using uniform linear and rectangular antenna arrays. The obtained results reveal that QPSK-GSM system is the best combination compared to GSM systems with any other Mary modulation scheme (e.g. PSK, DPSK, QAM, PAM). Compared to DQPSK-GSM and 4PAM-GSM at 12bpcu, same number of receive and activated transmit antennas, the QPSK-GSM system offers a gain ranging from 3.4 dB up to 5 dB. The results reveals that low to medium residual PN in distributed oscillator architecture can be tolerated when using GSM-QPSK without phase noise mitigation. Thus, enforcing the GSM to be a promising candidate for ultra-high wireless data rate communication in sub-THz bands.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"17 1","pages":"5135-5139"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85198100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gaussian Lpcnet for Multisample Speech Synthesis","authors":"Vadim Popov, M. Kudinov, T. Sadekova","doi":"10.1109/ICASSP40776.2020.9053337","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053337","url":null,"abstract":"LPCNet vocoder has recently been presented to TTS community and is now gaining increasing popularity due to its effectiveness and high quality of the speech synthesized with it. In this work, we present a modification of LPCNet that is 1.5x faster, has twice less non-zero parameters and synthesizes speech of the same quality. Such enhancement is possible mostly due to two features that we introduce into the original architecture: the proposed vocoder is designed to generate 16-bit signal instead of 8-bit µ-companded signal, and it predicts two consecutive excitation values at a time independently of each other. To show that these modifications do not lead to quality degradation we train models for five different languages and perform extensive human evaluation.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"32 1","pages":"6204-6208"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85209595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhi Lin, Min Lin, B. Champagne, Wei-Ping Zhu, N. Al-Dhahir
{"title":"Robust Hybrid Beamforming for Satellite-Terrestrial Integrated Networks","authors":"Zhi Lin, Min Lin, B. Champagne, Wei-Ping Zhu, N. Al-Dhahir","doi":"10.1109/ICASSP40776.2020.9053756","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053756","url":null,"abstract":"In this paper, we propose a novel robust downlink beamforming (BF) design for satellite-terrestrial integrated networks. Under a realistic assumption that the angular information of eavesdroppers is not perfectly known, we establish an optimization framework for hybrid BF at the terrestrial base station and digital BF at the satellite to maximize the secrecy-energy efficiency of the system, while satisfying the quality-of-service constraints of both earth station and cellular user. Since the formulated optimization problem is mathematically intractable, we present an iterative algorithm based on the Charnes-Cooper approach to optimize the BF weight vectors. The effectiveness and superiority of the proposed robust hybrid BF scheme are validated via computer simulations.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"8792-8796"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85228409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}