{"title":"Non-Griffin–Lim Type Signal Recovery from Magnitude Spectrogram","authors":"Ryusei Nakatsu, D. Kitahara, A. Hirabayashi","doi":"10.1109/ICASSP40776.2020.9053576","DOIUrl":null,"url":null,"abstract":"Speech and audio signal processing frequently requires to recover a time-domain signal from the magnitude of a spectrogram. Conventional methods inversely transform the magnitude spectrogram with a phase spectrogram recovered by the Griffin–Lim algorithm or its accelerated versions. The short-time Fourier transform (STFT) perfectly matches this framework, while other useful spectrogram transforms, such as the constant-Q transform (CQT), do not, because their inverses cannot be computed easily. To make the best of such useful spectrogram transforms, we propose an algorithm which recovers the time-domain signal without the inverse spectrogram transforms. We formulate the signal recovery as a nonconvex optimization problem, which is difficult to solve exactly. To approximately solve the problem, we exploit a stochastic convex optimization technique. A well-organized block selection enables us both to avoid local minimums and to achieve fast convergence. Numerical experiments show the effectiveness of the proposed method for both STFT and CQT cases.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"11 3 1","pages":"791-795"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP40776.2020.9053576","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Speech and audio signal processing frequently requires to recover a time-domain signal from the magnitude of a spectrogram. Conventional methods inversely transform the magnitude spectrogram with a phase spectrogram recovered by the Griffin–Lim algorithm or its accelerated versions. The short-time Fourier transform (STFT) perfectly matches this framework, while other useful spectrogram transforms, such as the constant-Q transform (CQT), do not, because their inverses cannot be computed easily. To make the best of such useful spectrogram transforms, we propose an algorithm which recovers the time-domain signal without the inverse spectrogram transforms. We formulate the signal recovery as a nonconvex optimization problem, which is difficult to solve exactly. To approximately solve the problem, we exploit a stochastic convex optimization technique. A well-organized block selection enables us both to avoid local minimums and to achieve fast convergence. Numerical experiments show the effectiveness of the proposed method for both STFT and CQT cases.