{"title":"Parallel Recognition of Mandarin Tones and Focus from continuous F0","authors":"Yue Chen, Yi Xu","doi":"10.21437/tai.2021-35","DOIUrl":null,"url":null,"abstract":"In tonal languages not only lexical tones but also prosodic focus can be encoded by generating F 0 contours. Such concurrent encoding of tone and intonation in speech production can be computationally simulated by speech synthesis models. It is yet unclear, however, how exactly both tone and focus can be decoded in perception from a single stream of surface F 0 contours. In this study, we applied the support vector machine (SVM) model to recognize tone and focus from F 0 trajectories in an experimental Mandarin corpus to indirectly answer the question. Three sub-experiments were run to compare the recognition strategies: recognizing tones only, recognizing focus only, and recognizing tones and focus at the same time. The recognition rate of the four tones regardless of focus was 88.3%. The recognition rate for focus regardless of tone was 77.5%. The overall recognition rates for tone-focus combinations were similar to the previous two experiments, while the breakdown of the accuracies showed that the recognition rate varied extensively across both focus conditions and tone conditions. Those results showed that the perception of tone and focus from continuous speech is likely dependent on each other, and tone and focus could be recognized in parallel.","PeriodicalId":145363,"journal":{"name":"1st International Conference on Tone and Intonation (TAI)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"1st International Conference on Tone and Intonation (TAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/tai.2021-35","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In tonal languages not only lexical tones but also prosodic focus can be encoded by generating F 0 contours. Such concurrent encoding of tone and intonation in speech production can be computationally simulated by speech synthesis models. It is yet unclear, however, how exactly both tone and focus can be decoded in perception from a single stream of surface F 0 contours. In this study, we applied the support vector machine (SVM) model to recognize tone and focus from F 0 trajectories in an experimental Mandarin corpus to indirectly answer the question. Three sub-experiments were run to compare the recognition strategies: recognizing tones only, recognizing focus only, and recognizing tones and focus at the same time. The recognition rate of the four tones regardless of focus was 88.3%. The recognition rate for focus regardless of tone was 77.5%. The overall recognition rates for tone-focus combinations were similar to the previous two experiments, while the breakdown of the accuracies showed that the recognition rate varied extensively across both focus conditions and tone conditions. Those results showed that the perception of tone and focus from continuous speech is likely dependent on each other, and tone and focus could be recognized in parallel.