{"title":"扬声器物理参数估计的短期分析","authors":"Rita Singh, B. Raj, J. Baker","doi":"10.1109/IWBF.2016.7449696","DOIUrl":null,"url":null,"abstract":"Conventional approaches to estimating speakers' physiometric parameters such as height, age, weight etc. from their voice analyze the speech signal at relatively coarse time resolutions, typically with analysis windows of 25ms or longer. At these resolutions the analysis effectively captures the structure of the supra-glottal vocal tract. In this paper we hypothesize that by analyzing the signal at a finer temporal resolution that is lower than a pitch period, it may be possible to analyze segments of the speech signal that are obtained entirely when the glottis is open, and thereby capture some of the sub-glottal structure that may be represented in the voice. To explore this hypothesis we propose an analysis approach that combines signal analysis techniques suited to fine-temporal-resolution analysis and well-known regression models. We test it on the prediction of heights and ages of speakers from a standard speech database. Our findings show that the higher-resolution analysis does provide benefits over conventional analysis for estimating speaker height, although it is less useful in predicting age.","PeriodicalId":282164,"journal":{"name":"2016 4th International Conference on Biometrics and Forensics (IWBF)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Short-term analysis for estimating physical parameters of speakers\",\"authors\":\"Rita Singh, B. Raj, J. Baker\",\"doi\":\"10.1109/IWBF.2016.7449696\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Conventional approaches to estimating speakers' physiometric parameters such as height, age, weight etc. from their voice analyze the speech signal at relatively coarse time resolutions, typically with analysis windows of 25ms or longer. At these resolutions the analysis effectively captures the structure of the supra-glottal vocal tract. In this paper we hypothesize that by analyzing the signal at a finer temporal resolution that is lower than a pitch period, it may be possible to analyze segments of the speech signal that are obtained entirely when the glottis is open, and thereby capture some of the sub-glottal structure that may be represented in the voice. To explore this hypothesis we propose an analysis approach that combines signal analysis techniques suited to fine-temporal-resolution analysis and well-known regression models. We test it on the prediction of heights and ages of speakers from a standard speech database. Our findings show that the higher-resolution analysis does provide benefits over conventional analysis for estimating speaker height, although it is less useful in predicting age.\",\"PeriodicalId\":282164,\"journal\":{\"name\":\"2016 4th International Conference on Biometrics and Forensics (IWBF)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-03-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 4th International Conference on Biometrics and Forensics (IWBF)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IWBF.2016.7449696\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 4th International Conference on Biometrics and Forensics (IWBF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWBF.2016.7449696","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Short-term analysis for estimating physical parameters of speakers
Conventional approaches to estimating speakers' physiometric parameters such as height, age, weight etc. from their voice analyze the speech signal at relatively coarse time resolutions, typically with analysis windows of 25ms or longer. At these resolutions the analysis effectively captures the structure of the supra-glottal vocal tract. In this paper we hypothesize that by analyzing the signal at a finer temporal resolution that is lower than a pitch period, it may be possible to analyze segments of the speech signal that are obtained entirely when the glottis is open, and thereby capture some of the sub-glottal structure that may be represented in the voice. To explore this hypothesis we propose an analysis approach that combines signal analysis techniques suited to fine-temporal-resolution analysis and well-known regression models. We test it on the prediction of heights and ages of speakers from a standard speech database. Our findings show that the higher-resolution analysis does provide benefits over conventional analysis for estimating speaker height, although it is less useful in predicting age.