{"title":"On Adaptive LASSO-based Sparse Time-Varying Complex AR Speech Analysis","authors":"K. Funaki","doi":"10.1109/ISCSLP49672.2021.9362085","DOIUrl":null,"url":null,"abstract":"Linear Prediction (LP) is commonly used in speech processing. In speech coding, the LP is used to remove the formant elements from the speech signal, and the residual is quantized by using the Algebraic code vector after removing pitch elements. In speech synthesis, the LP is also used to generate the glottal or residual excitation for the WaveNet. We have proposed a Time-Varying Complex AR (TV-CAR) speech analysis for an analytic signal to cope with the drawbacks of the LP, such as MMSE, Extended Least Square (ELS), that are the l2-norm optimization methods. We have already evaluated the performance on F0 estimation and robust automatic speech recognition. Recently, we have proposed l2-norm regularized LP-based TV-CAR analysis in the time-domain and the frequency-domain. The regularized TV-CAR method can estimate more accurate formant frequencies, and we have shown that the resulting LP residual makes it possible to estimate a more precise F0. On the other hand, sparse estimation based on l1-norm optimization has been focused on image processing that can extract meaningful information from colossal information. LASSO algorithm is an l1-norm regularized sparse algorithm. In this paper, adaptive LASSO-based TV-CAR analysis is proposed, and the performance is evaluated using the F0 estimation.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP49672.2021.9362085","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Linear Prediction (LP) is commonly used in speech processing. In speech coding, the LP is used to remove the formant elements from the speech signal, and the residual is quantized by using the Algebraic code vector after removing pitch elements. In speech synthesis, the LP is also used to generate the glottal or residual excitation for the WaveNet. We have proposed a Time-Varying Complex AR (TV-CAR) speech analysis for an analytic signal to cope with the drawbacks of the LP, such as MMSE, Extended Least Square (ELS), that are the l2-norm optimization methods. We have already evaluated the performance on F0 estimation and robust automatic speech recognition. Recently, we have proposed l2-norm regularized LP-based TV-CAR analysis in the time-domain and the frequency-domain. The regularized TV-CAR method can estimate more accurate formant frequencies, and we have shown that the resulting LP residual makes it possible to estimate a more precise F0. On the other hand, sparse estimation based on l1-norm optimization has been focused on image processing that can extract meaningful information from colossal information. LASSO algorithm is an l1-norm regularized sparse algorithm. In this paper, adaptive LASSO-based TV-CAR analysis is proposed, and the performance is evaluated using the F0 estimation.