{"title":"基于深度特征损失的语音信号声门瞬时信号提取","authors":"Supritha M. Shetty, Suraj Durgesht, K. Deepak","doi":"10.1109/SPCOM55316.2022.9840808","DOIUrl":null,"url":null,"abstract":"Electroglottograph (EGG) is a device used to measure the conductance between the vocal folds. The analysis of EGG signal has many applications in the literature such as speech-to-text synthesis, voice disorder analysis, emotion recognition, speaker verification, etc. Therefore, the EGG device is essential to record the vocal folds activity. Alternatively, a new method is proposed in this work to synthesize the EGG waveform from speech signal using a context aggregation convolutional neural network. The synthesis network is trained by accounting the deep feature losses obtained by comparing it with another network called the EGG classification network. The synthesized EGG signal needs to be characterized. During the voiced speech production, the instants at which the vocal folds attain complete closure are called glottal closure instants (GCIs). Likewise, the opening instants are called glottal opening instants (GOIs). Such instants are reliably measured using the EGG signal. The performance of the proposed method is compared with other state-of-the-art techniques. The CMU-Arctic database has a parallel corpus of speech and EGG signal recorded simultaneously. This database is used for training the synthesis network and for comparison purposes. It is found that the performance of extracting glottal instants from synthesized EGG signals is comparable to other methods.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Glottal instants extraction from speech signal using Deep Feature Loss\",\"authors\":\"Supritha M. Shetty, Suraj Durgesht, K. Deepak\",\"doi\":\"10.1109/SPCOM55316.2022.9840808\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Electroglottograph (EGG) is a device used to measure the conductance between the vocal folds. The analysis of EGG signal has many applications in the literature such as speech-to-text synthesis, voice disorder analysis, emotion recognition, speaker verification, etc. Therefore, the EGG device is essential to record the vocal folds activity. Alternatively, a new method is proposed in this work to synthesize the EGG waveform from speech signal using a context aggregation convolutional neural network. The synthesis network is trained by accounting the deep feature losses obtained by comparing it with another network called the EGG classification network. The synthesized EGG signal needs to be characterized. During the voiced speech production, the instants at which the vocal folds attain complete closure are called glottal closure instants (GCIs). Likewise, the opening instants are called glottal opening instants (GOIs). Such instants are reliably measured using the EGG signal. The performance of the proposed method is compared with other state-of-the-art techniques. The CMU-Arctic database has a parallel corpus of speech and EGG signal recorded simultaneously. This database is used for training the synthesis network and for comparison purposes. It is found that the performance of extracting glottal instants from synthesized EGG signals is comparable to other methods.\",\"PeriodicalId\":246982,\"journal\":{\"name\":\"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPCOM55316.2022.9840808\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPCOM55316.2022.9840808","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Glottal instants extraction from speech signal using Deep Feature Loss
Electroglottograph (EGG) is a device used to measure the conductance between the vocal folds. The analysis of EGG signal has many applications in the literature such as speech-to-text synthesis, voice disorder analysis, emotion recognition, speaker verification, etc. Therefore, the EGG device is essential to record the vocal folds activity. Alternatively, a new method is proposed in this work to synthesize the EGG waveform from speech signal using a context aggregation convolutional neural network. The synthesis network is trained by accounting the deep feature losses obtained by comparing it with another network called the EGG classification network. The synthesized EGG signal needs to be characterized. During the voiced speech production, the instants at which the vocal folds attain complete closure are called glottal closure instants (GCIs). Likewise, the opening instants are called glottal opening instants (GOIs). Such instants are reliably measured using the EGG signal. The performance of the proposed method is compared with other state-of-the-art techniques. The CMU-Arctic database has a parallel corpus of speech and EGG signal recorded simultaneously. This database is used for training the synthesis network and for comparison purposes. It is found that the performance of extracting glottal instants from synthesized EGG signals is comparable to other methods.