语音建模中的音素分布评估

Symposium on Medical Information Processing and Analysis Pub Date : 2023-03-06 DOI:10.1117/12.2670042

J. A. Parra, C. Calvache, M. Zañartu

{"title":"语音建模中的音素分布评估","authors":"J. A. Parra, C. Calvache, M. Zañartu","doi":"10.1117/12.2670042","DOIUrl":null,"url":null,"abstract":"Phonetically balanced texts are used to study different voice and speech characteristics. In the context of clinical work and research, these texts provide a standard for quantifying perceptual, acoustic, or aerodynamic assessments. Recent modeling efforts are being devoted to describing long-term speech behaviors based on a collection of sustained phonemes. However, comprehensive descriptions of phoneme distributions representative of connected speech are not readily available. Thus, the present study introduces a method to estimate phoneme distributions using text data mining, as an alternative to existing power law methods. The procedure used for the decomposition of texts into phonemes, the estimation of the phonetic distributions and the comparisons between different texts, conversational speech, and standard reading passages are discussed. The results are presented using histograms and R-squared determination coefficients for the case of the English language, although the approach can be easily applied for other languages. A discussion of the proposed method, results, and limitations is presented.","PeriodicalId":147201,"journal":{"name":"Symposium on Medical Information Processing and Analysis","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing phoneme distribution for speech modeling\",\"authors\":\"J. A. Parra, C. Calvache, M. Zañartu\",\"doi\":\"10.1117/12.2670042\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Phonetically balanced texts are used to study different voice and speech characteristics. In the context of clinical work and research, these texts provide a standard for quantifying perceptual, acoustic, or aerodynamic assessments. Recent modeling efforts are being devoted to describing long-term speech behaviors based on a collection of sustained phonemes. However, comprehensive descriptions of phoneme distributions representative of connected speech are not readily available. Thus, the present study introduces a method to estimate phoneme distributions using text data mining, as an alternative to existing power law methods. The procedure used for the decomposition of texts into phonemes, the estimation of the phonetic distributions and the comparisons between different texts, conversational speech, and standard reading passages are discussed. The results are presented using histograms and R-squared determination coefficients for the case of the English language, although the approach can be easily applied for other languages. A discussion of the proposed method, results, and limitations is presented.\",\"PeriodicalId\":147201,\"journal\":{\"name\":\"Symposium on Medical Information Processing and Analysis\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Symposium on Medical Information Processing and Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2670042\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Symposium on Medical Information Processing and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2670042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

语音平衡文本用于研究不同的语音和言语特征。在临床工作和研究的背景下，这些文本提供了量化感知，声学或空气动力学评估的标准。最近的建模工作致力于描述基于持续音素集合的长期言语行为。然而，对连接语音的音素分布的全面描述并不容易获得。因此，本研究引入了一种使用文本数据挖掘来估计音素分布的方法，作为现有幂律方法的替代方法。本文讨论了将文本分解为音素的过程、语音分布的估计以及不同文本、会话语音和标准阅读段落之间的比较。虽然这种方法可以很容易地应用于其他语言，但对于英语来说，结果是使用直方图和r平方决定系数来呈现的。讨论了所提出的方法、结果和局限性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Assessing phoneme distribution for speech modeling

Phonetically balanced texts are used to study different voice and speech characteristics. In the context of clinical work and research, these texts provide a standard for quantifying perceptual, acoustic, or aerodynamic assessments. Recent modeling efforts are being devoted to describing long-term speech behaviors based on a collection of sustained phonemes. However, comprehensive descriptions of phoneme distributions representative of connected speech are not readily available. Thus, the present study introduces a method to estimate phoneme distributions using text data mining, as an alternative to existing power law methods. The procedure used for the decomposition of texts into phonemes, the estimation of the phonetic distributions and the comparisons between different texts, conversational speech, and standard reading passages are discussed. The results are presented using histograms and R-squared determination coefficients for the case of the English language, although the approach can be easily applied for other languages. A discussion of the proposed method, results, and limitations is presented.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Symposium on Medical Information Processing and Analysis

自引率

0.00%

发文量