{"title":"Distributed Non-Orthogonal Pilot Design for Multi-Cell Massive Mimo Systems","authors":"Yue Wu, Shaodan Med, Yuantao Gu","doi":"10.1109/ICASSP40776.2020.9053224","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053224","url":null,"abstract":"In this work, a distributed non-orthogonal pilot design approach is proposed to tackle the pilot contamination problem in multi-cell massive multiple input multiple output (MIMO) systems. The pilot signals are designed under power constraints by minimizing the total mean square errors (MSEs) of the minimum mean square error (MMSE) channel estimators of all base stations (BSs). In order to solve the above non-convex pilot design problem, the stochastic variance reduced gradient (SVRG) projection algorithm is introduced, where the pilots signals are optimized in a distributed way at individual BSs. The SVRG projection algorithm preserves the randomness of the transient gradient, which makes the solution more likely jump out of the local minima. Moreover, only part of the BSs are activated to perform the gradient descent operation during each iteration, producing a green and low-cost infrastructure. Numerical simulations demonstrate the superiority of the proposed approach in terms of the channel estimation accuracy and uplink achievable sum rate.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"45 15 1","pages":"5195-5199"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72729118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Coupled Training of Sequence-to-Sequence Models for Accented Speech Recognition","authors":"Vinit Unni, Nitish Joshi, P. Jyothi","doi":"10.1109/ICASSP40776.2020.9052912","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9052912","url":null,"abstract":"Accented speech poses significant challenges for state-of-the-art automatic speech recognition (ASR) systems. Accent is a property of speech that lasts throughout an utterance in varying degrees of strength. This makes it hard to isolate the influence of accent on individual speech sounds. We propose coupled training for encoder-decoder ASR models that acts on pairs of utterances corresponding to the same text spoken by speakers with different accents. This training regime introduces an L2 loss between the attention-weighted representations corresponding to pairs of utterances with the same text, thus acting as a regularizer and encouraging representations from the encoder to be more accent-invariant. We focus on recognizing accented English samples from the Mozilla Common Voice corpus. We obtain significant error rate reductions on accented samples from a large set of diverse accents using coupled training. We also show consistent improvements in performance on heavily accented samples (as determined by a standalone accent classifier).","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"238 1","pages":"8254-8258"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72743569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Non-Griffin–Lim Type Signal Recovery from Magnitude Spectrogram","authors":"Ryusei Nakatsu, D. Kitahara, A. Hirabayashi","doi":"10.1109/ICASSP40776.2020.9053576","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053576","url":null,"abstract":"Speech and audio signal processing frequently requires to recover a time-domain signal from the magnitude of a spectrogram. Conventional methods inversely transform the magnitude spectrogram with a phase spectrogram recovered by the Griffin–Lim algorithm or its accelerated versions. The short-time Fourier transform (STFT) perfectly matches this framework, while other useful spectrogram transforms, such as the constant-Q transform (CQT), do not, because their inverses cannot be computed easily. To make the best of such useful spectrogram transforms, we propose an algorithm which recovers the time-domain signal without the inverse spectrogram transforms. We formulate the signal recovery as a nonconvex optimization problem, which is difficult to solve exactly. To approximately solve the problem, we exploit a stochastic convex optimization technique. A well-organized block selection enables us both to avoid local minimums and to achieve fast convergence. Numerical experiments show the effectiveness of the proposed method for both STFT and CQT cases.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"11 3 1","pages":"791-795"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74560715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Talker-Independent Speaker Separation in Reverberant Conditions","authors":"Masood Delfarah, Yuzhou Liu, Deliang Wang","doi":"10.1109/ICASSP40776.2020.9054422","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054422","url":null,"abstract":"Speaker separation refers to the task of separating a mixture signal comprising two or more speakers. Impressive advances have been made recently in deep learning based talker-independent speaker separation. But such advances are achieved in anechoic conditions. We address talker-independent speaker separation in reverberant conditions by exploring a recently proposed deep CASA approach. To effectively deal with speaker separation and speech dereverberation, we propose a two-stage strategy where reverberant utterances are first separated and then dereverberated. The two-stage deep CASA method outperforms other talker-independent separation methods. In addition, the deep CASA algorithm produces substantial speech intelligibility improvements for human listeners, with a particularly large benefit for hearing-impaired listeners.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"52 1","pages":"8723-8727"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78399017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohamed Sana, A. Domenico, E. Strinati, Antonio Clemente
{"title":"Multi-Agent Deep Reinforcement Learning For Distributed Handover Management In Dense MmWave Networks","authors":"Mohamed Sana, A. Domenico, E. Strinati, Antonio Clemente","doi":"10.1109/ICASSP40776.2020.9052936","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9052936","url":null,"abstract":"The dense deployment of millimeter wave small cells combined with directional beamforming is a promising solution to enhance the network capacity of the current generation of wireless communications. However, the reliability of millimeter wave communication links can be affected by severe pathloss, blockage, and deafness. As a result, mobile users are subject to frequent handoffs, which deteriorate the user throughput and the battery lifetime of mobile terminals. To tackle this problem, our paper proposes a deep multi-agent reinforcement learning framework for distributed handover management called RHando (Reinforced Handover). We model users as agents that learn how to perform handover to optimize the network throughput while taking into account the associated cost. The proposed solution is fully distributed, thus limiting signaling and computation overhead. Numerical results show that the proposed solution can provide higher throughput compared to conventional schemes while considerably limiting the frequency of the handovers.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"9 1","pages":"8976-8980"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78443120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Concentration-Based Polynomial Calculations on Nicked DNA","authors":"Tonglin Chen, Marc D. Riedel","doi":"10.1109/ICASSP40776.2020.9053353","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053353","url":null,"abstract":"In this paper, we introduce a novel scheme for computing polynomial functions on a substrate of nicked DNA. We first discuss a fractional encoding of data, based on the concentration of nicked double DNA strands. Then we show how to perform multiplication on this representation. Next we describe the read-out process, effected by releasing single strands. We show how to perform simple mathematical operations such as addition and subtraction, as well as how to scale constant values using probabilistic switches. We also describe two complex operations: calculating a vector dot product and computing a general polynomial function. We conclude by discussing potential applications of our scheme, practical challenges, and future research directions.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"14 1","pages":"8836-8840"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77290363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal Window Design for Joint Spatial-Spectral Domain Filtering of Signals on the Sphere","authors":"Adeem Aslam, Z. Khalid","doi":"10.1109/ICASSP40776.2020.9054085","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054085","url":null,"abstract":"We present the optimal design of an azimuthally symmetric window signal for carrying out joint spatial-spectral domain filtering of a spherical (source) signal contaminated by a realization of an anisotropic noise process. The resulting window is used in the computation of spatially localized spherical harmonic transform of the noise-contaminated signal. We formulate the window design problem using the joint spatial-spectral domain filtering framework and choose the optimality criterion which minimizes the mean square error between the (noise-free) source signal and its filtered estimate. The azimuthally symmetric optimal window signal is shown to be specified by the statistics of the source and noise processes. We illustrate the capability of the proposed window signal by applying the joint spatial-spectral domain filtering framework to the bandlimited Mars topography map and demonstrate improvements in the output signal to noise ratio (SNR) for different values of input SNR.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"117 1","pages":"5785-5789"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75895557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fg2seq: Effectively Encoding Knowledge for End-To-End Task-Oriented Dialog","authors":"Zhenhao He, Yuhong He, Qingyao Wu, Jian Chen","doi":"10.1109/ICASSP40776.2020.9053667","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053667","url":null,"abstract":"End-to-end Task-oriented spoken dialog systems typically require modeling two types of inputs, namely, the dialog history which is a sequence of utterances and the knowledge base (KB) associated with the dialog history. While modeling these inputs, current state-of-the-art models typically ignore the rich structure in the knowledge graph or its intrinsic association with the dialog history. In this paper, we propose a Flow-to-Graph seq2seq model (FG2Seq) which can effectively encode knowledge by considering inherent structural information of the knowledge graph and latent semantic information from dialog history. Experiments on two publicly available task oriented dialog datasets show that our proposed FG2Seq achieves robust performance on generating appropriate system responses and outperforms the baseline systems.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"63 1","pages":"8029-8033"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76151956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clustering of Nonnegative Data and an Application to Matrix Completion","authors":"Christopher Strohmeier, D. Needell","doi":"10.1109/ICASSP40776.2020.9052980","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9052980","url":null,"abstract":"In this article, we propose a simple algorithm to cluster nonnegative data lying in disjoint subspaces. We analyze its performance in relation to a certain measure of correlation between said subspaces. We use our clustering algorithm to develop a matrix completion algorithm which can outperform standard matrix completion algorithms on data matrices satisfying a certain natural low rank condition.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"13 1","pages":"8349-8353"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75080477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Su Yan, Jun-Jie Huang, Nathan Daly, C. Higgitt, P. Dragotti
{"title":"Revealing Hidden Drawings in Leonardo’s ‘the Virgin of the Rocks’ from Macro X-Ray Fluorescence Scanning Data through Element Line Localisation","authors":"Su Yan, Jun-Jie Huang, Nathan Daly, C. Higgitt, P. Dragotti","doi":"10.1109/ICASSP40776.2020.9054460","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054460","url":null,"abstract":"Macro X-Ray Fluorescence (XRF) scanning is an increasingly widely used imaging technique for the non-invasive detection and mapping of chemical elements in Old Master paintings. Existing approaches for XRF signal analysis require varying degrees of expert user input. They are mainly based on peak fitting at fixed energies associated with each element and require the target elements to be selected manually. In this paper, we propose a new method that can process macro XRF scanning data from paintings fully automatically. The method consists of two parts: 1) detecting pulses in an XRF spectrum using Finite Rate of Innovation (FRI) theory; 2) producing the distribution maps for each element automatically identified in the painting. The results presented show the ability of our method to detect weak or partially overlapping signals and more excitingly to have visualisation of underdrawing in a masterpiece by Leonardo da Vinci.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"26 1","pages":"1444-1448"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74936891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}