{"title":"基于共振峰频率、自相关、协方差和晶格的声道形状估计技术的比较","authors":"Ashwini S. Patil, M. Shah","doi":"10.1109/ICNTE.2015.7029934","DOIUrl":null,"url":null,"abstract":"Vocal tract is one of most important system in speech production and it begins at the glottis and ends at the lips. Vocal tract shape (VTS) is defined as varying cross sectional area from glottis-to-lips. Based on literature review it is noted that most of the research work carried out on vocal tract shape estimation (VTSE) is based on Wakita's algorithm which is based on autocorrelation of speech. The objective of this research work is to investigate VTSE based on formant frequencies, autocorrelation, covariance and lattice methods. For validation of results, data available for vocal tract shape for vowels from Magnetic Resonance Imaging (MRI) technique was used. Vowels /a/, /i/, /u/, /o/, vowel-semivowel-vowel utterances /aya/, /awa/ and some VCV syllables /apa/, /uba/ were analyzed for three female and three male speakers. From formant frequency, autocorrelation, covariance and lattice methods satisfactory results were obtained for vowels and semivowels. However, VTS for vowels based on formant frequency technique when compared with the MRI shapes were more realistic. From the investigation for effect of variation in analysis frame length on VTSE, it was observed that, lattice method required minimum analysis frame length compared to autocorrelation, and covariance methods, and estimated areas were more consistent across the analysis frames compared to other methods.","PeriodicalId":186188,"journal":{"name":"2015 International Conference on Nascent Technologies in the Engineering Field (ICNTE)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Comparison of vocal tract shape estimation techniques based on formant frequencies, autocorrelation, covariance and lattice\",\"authors\":\"Ashwini S. Patil, M. Shah\",\"doi\":\"10.1109/ICNTE.2015.7029934\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Vocal tract is one of most important system in speech production and it begins at the glottis and ends at the lips. Vocal tract shape (VTS) is defined as varying cross sectional area from glottis-to-lips. Based on literature review it is noted that most of the research work carried out on vocal tract shape estimation (VTSE) is based on Wakita's algorithm which is based on autocorrelation of speech. The objective of this research work is to investigate VTSE based on formant frequencies, autocorrelation, covariance and lattice methods. For validation of results, data available for vocal tract shape for vowels from Magnetic Resonance Imaging (MRI) technique was used. Vowels /a/, /i/, /u/, /o/, vowel-semivowel-vowel utterances /aya/, /awa/ and some VCV syllables /apa/, /uba/ were analyzed for three female and three male speakers. From formant frequency, autocorrelation, covariance and lattice methods satisfactory results were obtained for vowels and semivowels. However, VTS for vowels based on formant frequency technique when compared with the MRI shapes were more realistic. From the investigation for effect of variation in analysis frame length on VTSE, it was observed that, lattice method required minimum analysis frame length compared to autocorrelation, and covariance methods, and estimated areas were more consistent across the analysis frames compared to other methods.\",\"PeriodicalId\":186188,\"journal\":{\"name\":\"2015 International Conference on Nascent Technologies in the Engineering Field (ICNTE)\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Nascent Technologies in the Engineering Field (ICNTE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICNTE.2015.7029934\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Nascent Technologies in the Engineering Field (ICNTE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNTE.2015.7029934","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparison of vocal tract shape estimation techniques based on formant frequencies, autocorrelation, covariance and lattice
Vocal tract is one of most important system in speech production and it begins at the glottis and ends at the lips. Vocal tract shape (VTS) is defined as varying cross sectional area from glottis-to-lips. Based on literature review it is noted that most of the research work carried out on vocal tract shape estimation (VTSE) is based on Wakita's algorithm which is based on autocorrelation of speech. The objective of this research work is to investigate VTSE based on formant frequencies, autocorrelation, covariance and lattice methods. For validation of results, data available for vocal tract shape for vowels from Magnetic Resonance Imaging (MRI) technique was used. Vowels /a/, /i/, /u/, /o/, vowel-semivowel-vowel utterances /aya/, /awa/ and some VCV syllables /apa/, /uba/ were analyzed for three female and three male speakers. From formant frequency, autocorrelation, covariance and lattice methods satisfactory results were obtained for vowels and semivowels. However, VTS for vowels based on formant frequency technique when compared with the MRI shapes were more realistic. From the investigation for effect of variation in analysis frame length on VTSE, it was observed that, lattice method required minimum analysis frame length compared to autocorrelation, and covariance methods, and estimated areas were more consistent across the analysis frames compared to other methods.