Khaled Ardah, Bruno Sokal, A. D. Almeida, M. Haardt
{"title":"Compressed Sensing Based Channel Estimation and Open-loop Training Design for Hybrid Analog-digital Massive MIMO Systems","authors":"Khaled Ardah, Bruno Sokal, A. D. Almeida, M. Haardt","doi":"10.1109/ICASSP40776.2020.9054443","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054443","url":null,"abstract":"Channel estimation in hybrid analog-digital massive MIMO systems is a challenging problem due to the high channel dimension, low signal-to-noise ratio before beamforming, and reduced number of radio-frequency chains. Compressed sensing based algorithms have been adopted to address these challenges by leveraging the sparse nature of millimeter-wave MIMO channels. In compressed sensing-based methods, the training vectors should be designed carefully to guarantee recoverability. Although using random vectors has an overwhelming recoverability guarantee, it has been recently shown that an optimized update, which could be obtained so that the mutual coherence of the resulting sensing matrix is minimized, can improve the recoverability guarantee. In this paper, we propose an openloop hybrid analog-digital beam-training framework, where a given sensing matrix is decomposed into analog and digital beamformers. The given sensing matrix can be designed efficiently offline to reduce computational complexity. Simulation results show that the proposed training method achieves a lower mutual coherence and an improved channel estimation performance than the other benchmark methods.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"4597-4601"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76943885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint Training of Deep Neural Networks for Multi-Channel Dereverberation and Speech Source Separation","authors":"M. Togami","doi":"10.1109/ICASSP40776.2020.9053791","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053791","url":null,"abstract":"In this paper, we propose a joint training of two deep neural networks (DNNs) for dereverberation and speech source separation. The proposed method connects the first DNN, the dereverberation part, the second DNN, and the speech source separation part in a cascade manner. The proposed method does not train each DNN separately. Instead, an integrated loss function which evaluates an output signal after dereverberation and speech source separation is adopted. The proposed method estimates the output signal as a probabilistic variable. Recently, in the speech source separation context, we proposed a loss function which evaluates the estimated posterior probability density function (PDF) of the output signal. In this paper, we extend this loss function into a loss function which evaluates not only speech source separation performance but also speech derevereberation performance. Since the output signal of the dereverberation part is converted into the input feature of the second DNN, gradient of the loss function is back-propagated into the first DNN through the input feature of the second DNN. Experimental results show that the proposed joint training of two DNNs is effective. It is also shown that the posterior PDF based loss function is effective in the joint training context.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"3032-3036"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81344185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deblurring And Super-Resolution Using Deep Gated Fusion Attention Networks For Face Images","authors":"Chao Yang, Long-Wen Chang","doi":"10.1109/ICASSP40776.2020.9053784","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053784","url":null,"abstract":"Image deblurring and super-resolution are very important in image processing such as face verification. However, when in the outdoors, we often get blurry and low resolution images. To solve the problem, we propose a deep gated fusion attention network (DGFAN) to generate a high resolution image without blurring artifacts. We extract features from two task-independent structures for deburring and super-resolution to avoid the error propagation in the cascade structure of deblurring and super-resolution. We also add an attention module in our network by using channel-wise and spatial-wise features for better features and propose an edge loss function to make the model focus on facial features like eyes and nose. DGFAN performs favorably against the state-of-arts methods in terms of PSNR and SSIM. Also, using the clear images generated by DGFAN can improve the accuracy on face verification.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"40 1","pages":"1623-1627"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81372307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gajan Suthokumar, V. Sethu, Kaavya Sriskandaraja, E. Ambikairajah
{"title":"Adversarial Multi-Task Learning for Speaker Normalization in Replay Detection","authors":"Gajan Suthokumar, V. Sethu, Kaavya Sriskandaraja, E. Ambikairajah","doi":"10.1109/ICASSP40776.2020.9054322","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054322","url":null,"abstract":"Spoofing detection algorithms in voice biometrics are adversely affected by differences in the speech characteristics of the various target users. In this paper, we propose a novel speaker normalisation technique that employs adversarial multi-task learning to compensate for this speaker variability. The proposed system is designed to learn a feature space that discriminates between genuine and replayed speech while simultaneously reduces the discrimination between different speakers. We initially characterise the impact of speaker variability and quantify the effect of the proposed speaker normalisation technique directly on the feature distributions. Following this, we validate the technique on spoofing detection experiments carried out on two different corpora, ASVSpoof 2017 v2.0 and BTAS 2016 replay, and demonstrate its effectiveness. We obtain EER of 7.11% and 0.83% on the two corpora respectively, lower than that of all relevant baselines.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"18 1","pages":"6609-6613"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81944233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xi Shen, D. Huang, C. Vincent, Wenxiao Wang, R. Togneri
{"title":"A Differential Approach for Rain Field Tomographic Reconstruction Using Microwave Signals from Leo Satellites","authors":"Xi Shen, D. Huang, C. Vincent, Wenxiao Wang, R. Togneri","doi":"10.1109/ICASSP40776.2020.9054284","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054284","url":null,"abstract":"A differential approach is proposed for tomographic rain field reconstruction using the estimated signal-to-noise ratio of microwave signals from low earth orbit satellites at the ground receivers, with the unknown baseline values eliminated before using least squares to reconstruct the attenuation field. Simulations are done when the baseline is modelled by an autoregressive process and when the baseline is assumed fixed. Comparisons between the reconstruction results for the differential and non-differential approaches suggest that the differential approach performs better in both scenarios. For high correlation coefficient and low model noise in the autoregressive process, the differential approach surpasses the non-differential approach significantly.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"42 1","pages":"9001-9005"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82470731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stochastic Graph Neural Networks","authors":"Zhan Gao, E. Isufi, Alejandro Ribeiro","doi":"10.1109/ICASSP40776.2020.9054424","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054424","url":null,"abstract":"Graph neural networks (GNNs) model nonlinear representations in graph data with applications in distributed agent coordination, control, and planning among others. However, current GNN implementations assume ideal distributed scenarios and ignore link fluctuations that occur due to environment or human factors. In these situations, the GNN fails to address its distributed task if the topological randomness is not considered accordingly. To overcome this issue, we put forth the stochastic graph neural network (SGNN) model: a GNN where the distributed graph convolutional operator is modified to account for the network changes. Since stochasticity brings in a new paradigm, we develop a novel learning process for the SGNN and introduce the stochastic gradient descent (SGD) algorithm to estimate the parameters. We prove through the SGD that the SGNN learning process converges to a stationary point under mild Lipschitz assumptions. Numerical simulations corroborate the proposed theory and show an improved performance of the SGNN compared with the conventional GNN when operating over random time varying graphs.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"2007 1","pages":"9080-9084"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82504172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Changhong Wang, V. Lostanlen, Emmanouil Benetos, E. Chew
{"title":"Playing Technique Recognition by Joint Time–Frequency Scattering","authors":"Changhong Wang, V. Lostanlen, Emmanouil Benetos, E. Chew","doi":"10.1109/ICASSP40776.2020.9053474","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053474","url":null,"abstract":"Playing techniques are important expressive elements in music signals. In this paper, we propose a recognition system based on the joint time–frequency scattering transform (jTFST) for pitch evolution-based playing techniques (PETs), a group of playing techniques with monotonic pitch changes over time. The jTFST represents spectro-temporal patterns in the time–frequency domain, capturing discriminative information of PETs. As a case study, we analyse three commonly used PETs of the Chinese bamboo flute: acciacatura, portamento, and glissando, and encode their characteristics using the jTFST. To verify the proposed approach, we create a new dataset, the CBF-petsDB, containing PETs played in isolation as well as in the context of whole pieces performed and annotated by professional players. Feeding the jTFST to a machine learning classifier, we obtain F-measures of 71% for acciacatura, 59% for portamento, and 83% for glissando detection, and provide explanatory visualisations of scattering coefficients for each technique.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"881-885"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78833403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Augmented Lagrangian-Based Method for Linear Equality-Constrained Lasso","authors":"Zengde Deng, Man-Chung Yue, A. M. So","doi":"10.1109/ICASSP40776.2020.9053722","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053722","url":null,"abstract":"Variable selection is one of the most important tasks in statistics and machine learning. To incorporate more prior information about the regression coefficients, various constrained Lasso models have been proposed in the literature. Compared with the classic (unconstrained) Lasso model, the algorithmic aspects of constrained Lasso models are much less explored. In this paper, we demonstrate how the recently developed semis-mooth Newton-based augmented Lagrangian framework can be extended to solve a linear equality-constrained Lasso model. A key technical challenge that is not present in prior works is the lack of strong convexity in our dual problem, which we overcome by adopting a regularization strategy. We show that under mild assumptions, our proposed method will converge superlinearly. Moreover, extensive numerical experiments on both synthetic and real-world data show that our method can be substantially faster than existing first-order methods while achieving a better solution accuracy.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"11 1","pages":"5760-5764"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78894006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Herman Verinaz-Jadan, P. Song, Carmel L. Howe, Amanda J. Foust, P. Dragotti
{"title":"Volume Reconstruction for Light Field Microscopy","authors":"Herman Verinaz-Jadan, P. Song, Carmel L. Howe, Amanda J. Foust, P. Dragotti","doi":"10.1109/ICASSP40776.2020.9053433","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053433","url":null,"abstract":"Light Field Microscopy (LFM) is a 3D imaging technique that captures volumetric information in a single snapshot. It is appealing in microscopy because of its simple implementation and the peculiarity that it is much faster than methods involving scanning. However, volume reconstruction for LFM suffers from low lateral resolution, high computational cost, and reconstruction artifacts near the native object plane. In this work, we make two contributions. First, we propose a simplification of the forward model based on a novel discretization approach that allows us to accelerate the computation without drastically increasing memory consumption. Second, we experimentally show that by including regularization priors and an appropriate initialization strategy, it is possible to remove the artifacts near the native object plane. The algorithm we use for this is ADMM. Finally, the combination of the two techniques leads to a method that outperforms classic volume reconstruction approaches (variants of Richardson-Lucy) in terms of average computational time and image quality (PSNR).","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"40 1","pages":"1459-1463"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78993354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supervised Deep Hashing for Efficient Audio Event Retrieval","authors":"Arindam Jati, Dimitra Emmanouilidou","doi":"10.1109/ICASSP40776.2020.9053766","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053766","url":null,"abstract":"Efficient retrieval of audio events can facilitate real-time implementation of numerous query and search-based systems. This work investigates the potency of different hashing techniques for efficient audio event retrieval. Multiple state-of-the-art weak audio embeddings are employed for this purpose. The performance of four classical unsupervised hashing algorithms is explored as part of off-the-shelf analysis. Then, we propose a partially supervised deep hashing framework that transforms the weak embeddings into a low-dimensional space while optimizing for efficient hash codes. The model uses only a fraction of the available labels and is shown here to significantly improve the retrieval accuracy on two widely employed audio event datasets. The extensive analysis and comparison between supervised and unsupervised hashing methods presented here, give insights on the quantizability of audio embeddings. This work provides a first look in efficient audio event retrieval systems and hopes to set baselines for future research.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"4497-4501"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76293284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}