{"title":"基于多元海表温度的含噪语音信号瞬时基音估计","authors":"M. I. Molla, Mahboob Qaosar, K. Hirose","doi":"10.1109/ISCAS.2016.7527354","DOIUrl":null,"url":null,"abstract":"This paper presents an instantaneous pitch estimation method based on data adaptive time domain filtering and multivariate synchrosqueezing transform (SST). The filtering approach is implemented with bivariate empirical mode decomposition (bEMD) using white Gaussian noise (wGn) as the reference signal. The bEMD decomposes speech and wGn together into a finite set of intrinsic mode functions (IMFs). The log-energy distribution of wGn's IMFs is employed to determine the threshold used in filtering. The IMFs of speech signal selected by such pre-filtering method is used to construct time-frequency representation (TFR) with multivariate SST. The frequency components are properly localized in the obtained TFR. Spatial filtering and post-processing are applied to TFR prior to estimate the instantaneous pitch. The experimental results illustrate the noise robustness and superiority of the proposed algorithm.","PeriodicalId":6546,"journal":{"name":"2016 IEEE International Symposium on Circuits and Systems (ISCAS)","volume":"30 1","pages":"770-773"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Instantaneous pitch estimation of noisy speech signal with multivariate SST\",\"authors\":\"M. I. Molla, Mahboob Qaosar, K. Hirose\",\"doi\":\"10.1109/ISCAS.2016.7527354\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents an instantaneous pitch estimation method based on data adaptive time domain filtering and multivariate synchrosqueezing transform (SST). The filtering approach is implemented with bivariate empirical mode decomposition (bEMD) using white Gaussian noise (wGn) as the reference signal. The bEMD decomposes speech and wGn together into a finite set of intrinsic mode functions (IMFs). The log-energy distribution of wGn's IMFs is employed to determine the threshold used in filtering. The IMFs of speech signal selected by such pre-filtering method is used to construct time-frequency representation (TFR) with multivariate SST. The frequency components are properly localized in the obtained TFR. Spatial filtering and post-processing are applied to TFR prior to estimate the instantaneous pitch. The experimental results illustrate the noise robustness and superiority of the proposed algorithm.\",\"PeriodicalId\":6546,\"journal\":{\"name\":\"2016 IEEE International Symposium on Circuits and Systems (ISCAS)\",\"volume\":\"30 1\",\"pages\":\"770-773\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Symposium on Circuits and Systems (ISCAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCAS.2016.7527354\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Symposium on Circuits and Systems (ISCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCAS.2016.7527354","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Instantaneous pitch estimation of noisy speech signal with multivariate SST
This paper presents an instantaneous pitch estimation method based on data adaptive time domain filtering and multivariate synchrosqueezing transform (SST). The filtering approach is implemented with bivariate empirical mode decomposition (bEMD) using white Gaussian noise (wGn) as the reference signal. The bEMD decomposes speech and wGn together into a finite set of intrinsic mode functions (IMFs). The log-energy distribution of wGn's IMFs is employed to determine the threshold used in filtering. The IMFs of speech signal selected by such pre-filtering method is used to construct time-frequency representation (TFR) with multivariate SST. The frequency components are properly localized in the obtained TFR. Spatial filtering and post-processing are applied to TFR prior to estimate the instantaneous pitch. The experimental results illustrate the noise robustness and superiority of the proposed algorithm.