ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
Preconditioned Ghost Imaging Via Sparsity Constraint 通过稀疏性约束的预条件鬼影成像
Zhishen Tong, Jian Wang, Shensheng Han
{"title":"Preconditioned Ghost Imaging Via Sparsity Constraint","authors":"Zhishen Tong, Jian Wang, Shensheng Han","doi":"10.1109/ICASSP40776.2020.9053414","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053414","url":null,"abstract":"Ghost imaging via sparsity constraint (GISC) can recover objects from the intensity fluctuation of light fields at a sampling rate far below the Nyquist rate. However, its imaging quality may degrade severely when the coherence of sampling matrices is large. To deal with this issue, we propose an efficient recovery algorithm for GISC called the preconditioned multiple orthogonal least squares (PmOLS). Our algorithm consists of two major parts: i) the pseudo-inverse preconditioning (PIP) method refining the coherence of sampling matrices and ii) the multiple orthogonal least squares (mOLS) algorithm recovering the objects. Theoretical analysis shows that PmOLS recovers any n-dimensional K-sparse signal from m random linear samples of the signal with probability exceeding $1 - 3{n^2}{e^{ - cm/{K^2}}}$. Simulations and experiments demonstrate that PmOLS has competitive imaging quality compared to the state-of-the-art approaches.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"246 1","pages":"1484-1488"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79349791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
End-end Speech-to-Text Translation with Modality Agnostic Meta-Learning 基于情态不可知元学习的端到端语音到文本翻译
S. Indurthi, HyoJung Han, Nikhil Kumar Lakumarapu, Beomseok Lee, Insoo Chung, Sangha Kim, Chanwoo Kim
{"title":"End-end Speech-to-Text Translation with Modality Agnostic Meta-Learning","authors":"S. Indurthi, HyoJung Han, Nikhil Kumar Lakumarapu, Beomseok Lee, Insoo Chung, Sangha Kim, Chanwoo Kim","doi":"10.1109/ICASSP40776.2020.9054759","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054759","url":null,"abstract":"Collecting large amounts of data to train end-to-end Speech Translation (ST) models is more difficult compared to the ASR and MT tasks. Previous studies have proposed the use of transfer learning approaches to overcome the above difficulty. These approaches benefit from weakly supervised training data, such as ASR speech-to-transcript or MT text-to-text translation pairs. However, the parameters in these models are updated independently of each task, which may lead to sub-optimal solutions. In this work, we adopt a meta-learning algorithm to train a modality agnostic multi-task model that transfers knowledge from source tasks=ASR+MT to target task=ST where the ST task severely lacks data. In the meta-learning phase, parameters are updated in such a way that they act as a good ini-tialization point for the target ST task. We evaluate the proposed meta-learning approach for ST tasks on English-German (En-De) and English-French (En-Fr) language pairs from the Multilingual Speech Translation Corpus (MuST-C). Our method outperforms the previous transfer learning approaches and sets new state-of-the-art results for En-De and En-Fr ST tasks by obtaining 9.18, and 11.76 BLEU point improvements, respectively.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"145 6 1","pages":"7904-7908"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79392323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Indoor Altitude Estimation of Unmanned Aerial Vehicles Using a Bank of Kalman Filters 基于卡尔曼滤波器的无人机室内高度估计
Liu Yang, Hechuan Wang, Yousef El-Laham, J. Fonte, David Trillo Pérez, M. Bugallo
{"title":"Indoor Altitude Estimation of Unmanned Aerial Vehicles Using a Bank of Kalman Filters","authors":"Liu Yang, Hechuan Wang, Yousef El-Laham, J. Fonte, David Trillo Pérez, M. Bugallo","doi":"10.1109/ICASSP40776.2020.9054203","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054203","url":null,"abstract":"Altitude estimation is important for successful control and navigation of unmanned aerial vehicles (UAVs). UAVs do not have indoor access to GPS signals and can only use on-board sensors for reliable estimation of altitude. Unfortunately, most existing navigation schemes are not robust to the presence of abnormal obstructions above and below the UAV. In this work, we propose a novel strategy for tackling the altitude estimation problem that utilizes multiple model adaptive estimation (MMAE), where the candidate models correspond to four scenarios: no obstacles above and below the UAV; obstacles above the UAV; obstacles below the UAV; and obstacles above and below the UAV. The principle of Occam’s razor ensures that the model that offers the most parsimonious explanation of the sensor data has the most influence in the MMAE algorithm. We validate the proposed scheme on synthetic and real sensor data.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"15 4-5 1","pages":"5455-5459"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84548241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
On the Effect of Reflectance on Phasor Field Non-Line-of-Sight Imaging 反射率对相场非视距成像的影响
Ibón Guillén, Xiaochun Liu, A. Velten, D. Gutierrez, A. Jarabo
{"title":"On the Effect of Reflectance on Phasor Field Non-Line-of-Sight Imaging","authors":"Ibón Guillén, Xiaochun Liu, A. Velten, D. Gutierrez, A. Jarabo","doi":"10.1109/ICASSP40776.2020.9052985","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9052985","url":null,"abstract":"Non-line-of-sight (NLOS) imaging aims to visualize occluded scenes by exploiting indirect reflections on visible surfaces. Previous methods approach this problem by inverting the light transport on the hidden scene, but are limited to isolated, diffuse objects. The recently introduced phasor fields framework computationally poses NLOS reconstruction as a virtual line-of-sight (LOS) problem, lifting most assumptions about the hidden scene. In this work we complement recent theoretical analysis of phasor field-based reconstruction, by empirically analyzing the effect of reflectance of the hidden scenes on reconstruction. We experimentally study the reconstruction of hidden scenes composed of objects with increasingly specular materials. Then, we evaluate the effect of the virtual aperture size on the reconstruction, and establish connections between the effect of these two different dimensions on the results. We hope our analysis helps to characterize the imaging capabilities of this promising new framework, and foster new NLOS imaging modalities.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"9269-9273"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84888628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
DGAN: Disentangled Representation Learning for Anisotropic BRDF Reconstruction 各向异性BRDF重建的解纠缠表示学习
Zhongyun Hu, Xue Wang, Qing Wang
{"title":"DGAN: Disentangled Representation Learning for Anisotropic BRDF Reconstruction","authors":"Zhongyun Hu, Xue Wang, Qing Wang","doi":"10.1109/ICASSP40776.2020.9054095","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054095","url":null,"abstract":"Accurate reconstruction of real-world materials’ appearance from a very limited number of samples is still a huge challenge in computer vision and graphics. In this paper, we present a novel deep architecture, Disentangled Generative Adversarial Network (DGAN), which performs anisotropic Bidirectional Reflectance Distribution Function (BRDF) reconstruction from single BRDF subspace with the maximum entropy. In contrast to previous approaches that directly map known samples to a full BRDF using a CNN, a disentangled representation learning is applied to guide the reconstruction process. In order to learn different physical factors of the BRDF, the generator of the DGAN mainly consists of a fresnel estimator module (FEM) and a directional module (DM). Considering the fact that the entropy of different BRDF subspace varies, we further divide the BRDF into He-BRDF and Le-BRDF to reconstruct the interior part and the exterior part of the directional factor. Experimental results show that our approach outperforms state-of-the-art methods.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"4397-4401"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84913573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Global Optimized Affine Registration Method for Microscopic Images of Biological Tissue 生物组织显微图像的鲁棒全局优化仿射配准方法
Yanan Lv, Xi Chen, Chang Shu, Hua Han
{"title":"Robust Global Optimized Affine Registration Method for Microscopic Images of Biological Tissue","authors":"Yanan Lv, Xi Chen, Chang Shu, Hua Han","doi":"10.1109/ICASSP40776.2020.9054568","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054568","url":null,"abstract":"Affine registration can fit the non-rigid deformation of slices effectively, and it is widely used in volume reconstruction of biological tissue. But most of the existing affine registration methods are registered in a given sequence, which results in the accumulation of errors. In this paper, a global optimized affine registration method is proposed, which can be used in volume reconstruction. To eliminate the cumulative error, the affine transformation of all images is estimated simultaneously based on an energy function. A soft penalty on affine transformation is added to restrict the shearing of images. Experiments show that our method provides a more reliable registration result compared with sequential affine registration. It can solve the problems caused by the accumulation of errors. The registration result fits the deformation of slices well and preserves the rigidity of images.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"12 1","pages":"1070-1074"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84935246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Time-Frequency Loss for CNN Based Speech Super-Resolution 基于CNN的语音超分辨率时频损失
Heming Wang, Deliang Wang
{"title":"Time-Frequency Loss for CNN Based Speech Super-Resolution","authors":"Heming Wang, Deliang Wang","doi":"10.1109/ICASSP40776.2020.9053712","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053712","url":null,"abstract":"Speech super-resolution (SR), also called speech bandwidth extension (BWE), aims to increase the sampling rate of a given lower resolution speech signal. Recent years have witnessed the successful application of deep neural networks in time or frequency domains, and deep learning has improved the performance considerably compared with conventional approaches. This paper proposes an autoencoder based fully convolutional neural network (CNN) that merges the information from both time and frequency domains. At the training time, we optimize the CNN using a new time-frequency loss (T-F loss), which combines a time domain loss and a frequency domain loss. The experimental results show that our model trained with the T-F loss achieves significantly better results than other state-of-the-art models, and yields balanced performance in terms of time and frequency metrics.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"861-865"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84980801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Generalized Spatial Modulation for Wireless Terabits Systems Under Sub-THZ Channel With RF Impairments 具有射频损伤的亚太赫兹信道下无线太比特系统的广义空间调制
Majed Saad, F. Bader, A. Ghouwayel, Hussein Hijazi, Nizar Bouhel, J. Palicot
{"title":"Generalized Spatial Modulation for Wireless Terabits Systems Under Sub-THZ Channel With RF Impairments","authors":"Majed Saad, F. Bader, A. Ghouwayel, Hussein Hijazi, Nizar Bouhel, J. Palicot","doi":"10.1109/ICASSP40776.2020.9053208","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053208","url":null,"abstract":"Multiple-Input Multiple-Output (MIMO) technique with Index Modulation (IM) over sub-TeraHertz (sub-THz) bands represent a promising solution to design new wireless ultrahigh data rate systems. However, the system design over sub-THz bands suffers from many technological limitations and severe RF-impairments such as low output power, limited resolution of high-speed low-power Analog-to-Digital Converters and important Phase Noise (PN) introduced by the Local Oscillator (LO). In this paper, different modulations schemes with Generalized Spatial Modulation (GSM) are compared from different perspectives while considering the sub-THz impairments. The effect of PN has been investigated for these modulation schemes in sub-THz channels using uniform linear and rectangular antenna arrays. The obtained results reveal that QPSK-GSM system is the best combination compared to GSM systems with any other Mary modulation scheme (e.g. PSK, DPSK, QAM, PAM). Compared to DQPSK-GSM and 4PAM-GSM at 12bpcu, same number of receive and activated transmit antennas, the QPSK-GSM system offers a gain ranging from 3.4 dB up to 5 dB. The results reveals that low to medium residual PN in distributed oscillator architecture can be tolerated when using GSM-QPSK without phase noise mitigation. Thus, enforcing the GSM to be a promising candidate for ultra-high wireless data rate communication in sub-THz bands.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"17 1","pages":"5135-5139"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85198100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Gaussian Lpcnet for Multisample Speech Synthesis 多样本语音合成的高斯Lpcnet
Vadim Popov, M. Kudinov, T. Sadekova
{"title":"Gaussian Lpcnet for Multisample Speech Synthesis","authors":"Vadim Popov, M. Kudinov, T. Sadekova","doi":"10.1109/ICASSP40776.2020.9053337","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053337","url":null,"abstract":"LPCNet vocoder has recently been presented to TTS community and is now gaining increasing popularity due to its effectiveness and high quality of the speech synthesized with it. In this work, we present a modification of LPCNet that is 1.5x faster, has twice less non-zero parameters and synthesizes speech of the same quality. Such enhancement is possible mostly due to two features that we introduce into the original architecture: the proposed vocoder is designed to generate 16-bit signal instead of 8-bit µ-companded signal, and it predicts two consecutive excitation values at a time independently of each other. To show that these modifications do not lead to quality degradation we train models for five different languages and perform extensive human evaluation.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"32 1","pages":"6204-6208"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85209595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Robust Hybrid Beamforming for Satellite-Terrestrial Integrated Networks 星地融合网络的鲁棒混合波束形成
Zhi Lin, Min Lin, B. Champagne, Wei-Ping Zhu, N. Al-Dhahir
{"title":"Robust Hybrid Beamforming for Satellite-Terrestrial Integrated Networks","authors":"Zhi Lin, Min Lin, B. Champagne, Wei-Ping Zhu, N. Al-Dhahir","doi":"10.1109/ICASSP40776.2020.9053756","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053756","url":null,"abstract":"In this paper, we propose a novel robust downlink beamforming (BF) design for satellite-terrestrial integrated networks. Under a realistic assumption that the angular information of eavesdroppers is not perfectly known, we establish an optimization framework for hybrid BF at the terrestrial base station and digital BF at the satellite to maximize the secrecy-energy efficiency of the system, while satisfying the quality-of-service constraints of both earth station and cellular user. Since the formulated optimization problem is mathematically intractable, we present an iterative algorithm based on the Charnes-Cooper approach to optimize the BF weight vectors. The effectiveness and superiority of the proposed robust hybrid BF scheme are validated via computer simulations.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"8792-8796"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85228409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信