基于共振峰频率、自相关、协方差和晶格的声道形状估计技术的比较

2015 International Conference on Nascent Technologies in the Engineering Field (ICNTE) Pub Date : 1900-01-01 DOI:10.1109/ICNTE.2015.7029934

Ashwini S. Patil, M. Shah

{"title":"基于共振峰频率、自相关、协方差和晶格的声道形状估计技术的比较","authors":"Ashwini S. Patil, M. Shah","doi":"10.1109/ICNTE.2015.7029934","DOIUrl":null,"url":null,"abstract":"Vocal tract is one of most important system in speech production and it begins at the glottis and ends at the lips. Vocal tract shape (VTS) is defined as varying cross sectional area from glottis-to-lips. Based on literature review it is noted that most of the research work carried out on vocal tract shape estimation (VTSE) is based on Wakita's algorithm which is based on autocorrelation of speech. The objective of this research work is to investigate VTSE based on formant frequencies, autocorrelation, covariance and lattice methods. For validation of results, data available for vocal tract shape for vowels from Magnetic Resonance Imaging (MRI) technique was used. Vowels /a/, /i/, /u/, /o/, vowel-semivowel-vowel utterances /aya/, /awa/ and some VCV syllables /apa/, /uba/ were analyzed for three female and three male speakers. From formant frequency, autocorrelation, covariance and lattice methods satisfactory results were obtained for vowels and semivowels. However, VTS for vowels based on formant frequency technique when compared with the MRI shapes were more realistic. From the investigation for effect of variation in analysis frame length on VTSE, it was observed that, lattice method required minimum analysis frame length compared to autocorrelation, and covariance methods, and estimated areas were more consistent across the analysis frames compared to other methods.","PeriodicalId":186188,"journal":{"name":"2015 International Conference on Nascent Technologies in the Engineering Field (ICNTE)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Comparison of vocal tract shape estimation techniques based on formant frequencies, autocorrelation, covariance and lattice\",\"authors\":\"Ashwini S. Patil, M. Shah\",\"doi\":\"10.1109/ICNTE.2015.7029934\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Vocal tract is one of most important system in speech production and it begins at the glottis and ends at the lips. Vocal tract shape (VTS) is defined as varying cross sectional area from glottis-to-lips. Based on literature review it is noted that most of the research work carried out on vocal tract shape estimation (VTSE) is based on Wakita's algorithm which is based on autocorrelation of speech. The objective of this research work is to investigate VTSE based on formant frequencies, autocorrelation, covariance and lattice methods. For validation of results, data available for vocal tract shape for vowels from Magnetic Resonance Imaging (MRI) technique was used. Vowels /a/, /i/, /u/, /o/, vowel-semivowel-vowel utterances /aya/, /awa/ and some VCV syllables /apa/, /uba/ were analyzed for three female and three male speakers. From formant frequency, autocorrelation, covariance and lattice methods satisfactory results were obtained for vowels and semivowels. However, VTS for vowels based on formant frequency technique when compared with the MRI shapes were more realistic. From the investigation for effect of variation in analysis frame length on VTSE, it was observed that, lattice method required minimum analysis frame length compared to autocorrelation, and covariance methods, and estimated areas were more consistent across the analysis frames compared to other methods.\",\"PeriodicalId\":186188,\"journal\":{\"name\":\"2015 International Conference on Nascent Technologies in the Engineering Field (ICNTE)\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Nascent Technologies in the Engineering Field (ICNTE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICNTE.2015.7029934\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Nascent Technologies in the Engineering Field (ICNTE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNTE.2015.7029934","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

声道是语音产生过程中最重要的系统之一，它起于声门，止于嘴唇。声道形状(VTS)被定义为从声门到嘴唇的不同横截面积。在文献综述中，我们注意到大多数关于声道形状估计的研究工作都是基于基于语音自相关的Wakita算法。本研究的目的是研究基于形成峰频率、自相关、协方差和晶格方法的VTSE。为了验证结果，使用磁共振成像(MRI)技术获得的元音声道形状数据。分析了3名女性和3名男性说话者的元音/a/、/i/、/u/、/o/、元音-半元音-元音发音/aya/、/awa/和部分VCV音节/apa/、/uba/。从形成峰频率、自相关、协方差和点阵方法对元音和半元音进行了分析，得到了满意的结果。然而，基于形成峰频率技术的元音VTS与MRI形状相比更为真实。从分析帧长变化对VTSE影响的研究中可以看出，与自相关法和协方差法相比，格点法所需的分析帧长最小，且估算区域在分析帧间的一致性优于其他方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparison of vocal tract shape estimation techniques based on formant frequencies, autocorrelation, covariance and lattice

Vocal tract is one of most important system in speech production and it begins at the glottis and ends at the lips. Vocal tract shape (VTS) is defined as varying cross sectional area from glottis-to-lips. Based on literature review it is noted that most of the research work carried out on vocal tract shape estimation (VTSE) is based on Wakita's algorithm which is based on autocorrelation of speech. The objective of this research work is to investigate VTSE based on formant frequencies, autocorrelation, covariance and lattice methods. For validation of results, data available for vocal tract shape for vowels from Magnetic Resonance Imaging (MRI) technique was used. Vowels /a/, /i/, /u/, /o/, vowel-semivowel-vowel utterances /aya/, /awa/ and some VCV syllables /apa/, /uba/ were analyzed for three female and three male speakers. From formant frequency, autocorrelation, covariance and lattice methods satisfactory results were obtained for vowels and semivowels. However, VTS for vowels based on formant frequency technique when compared with the MRI shapes were more realistic. From the investigation for effect of variation in analysis frame length on VTSE, it was observed that, lattice method required minimum analysis frame length compared to autocorrelation, and covariance methods, and estimated areas were more consistent across the analysis frames compared to other methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 International Conference on Nascent Technologies in the Engineering Field (ICNTE)

自引率

0.00%

发文量