Brittany N Fletcher, Wei-Wen Hsu, Vesna D Novak, Mary E Wilkens, Amy W Hobek, Amy S Pratt, Michelle Leon, Kimmerly Harrell, Victoria S McKenna
{"title":"非裔美国英语儿童自动语音识别不准确的人口学和声学因素。","authors":"Brittany N Fletcher, Wei-Wen Hsu, Vesna D Novak, Mary E Wilkens, Amy W Hobek, Amy S Pratt, Michelle Leon, Kimmerly Harrell, Victoria S McKenna","doi":"10.1044/2025_persp-25-00052","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>This study investigated the relationship between acoustic measures and Google's Speech-to-Text inaccuracies in recognizing speech of children ages 4-9 years who speak African American English (AAE).</p><p><strong>Methods: </strong>Audio recordings were collected from 11 AAE speaking children with speech stimuli targeting final plosive variations observed within the AAE dialect. Dialectal density was measured using the Diagnostic Evaluation of Language Variation Language Screener. Recordings were transcribed using Google's Speech-to-Text application (Google Voice) and inaccuracies were determined through comparison to researcher extracted transcriptions. Acoustic measures from vowels preceding final plosives (including vowel duration, fundamental frequency, average <i>F</i> <sub>1</sub>) were extracted using Praat and a custom MATLAB algorithm. Individual mixed-effects logistic regression models were conducted to analyze the relationships between acoustic measures and transcription accuracy (accurate vs. inaccurate) for voiced and voiceless plosives separately.</p><p><strong>Results: </strong>There were no significant differences between inaccuracy rates for voiced and voiceless plosive productions, nor were acoustic measures predictive of speech-to-text inaccuracy. However, age and dialect density were significantly related to voiceless plosive accuracy.</p><p><strong>Conclusions: </strong>The complexities of voice, motor and articulatory development within children can be characterized by acoustic measures. These measures inform acoustic algorithms created for speech technology. Research on acoustic measures in young child AAE speech, with considerations for dialect variability and age, will enhance speech recognition technology and clinical best practices.</p>","PeriodicalId":74424,"journal":{"name":"Perspectives of the ASHA special interest groups","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12490741/pdf/","citationCount":"0","resultStr":"{\"title\":\"Demographic and Acoustic Factors related to Automatic Speech Recognition Inaccuracies for Child African American English Speakers.\",\"authors\":\"Brittany N Fletcher, Wei-Wen Hsu, Vesna D Novak, Mary E Wilkens, Amy W Hobek, Amy S Pratt, Michelle Leon, Kimmerly Harrell, Victoria S McKenna\",\"doi\":\"10.1044/2025_persp-25-00052\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>This study investigated the relationship between acoustic measures and Google's Speech-to-Text inaccuracies in recognizing speech of children ages 4-9 years who speak African American English (AAE).</p><p><strong>Methods: </strong>Audio recordings were collected from 11 AAE speaking children with speech stimuli targeting final plosive variations observed within the AAE dialect. Dialectal density was measured using the Diagnostic Evaluation of Language Variation Language Screener. Recordings were transcribed using Google's Speech-to-Text application (Google Voice) and inaccuracies were determined through comparison to researcher extracted transcriptions. Acoustic measures from vowels preceding final plosives (including vowel duration, fundamental frequency, average <i>F</i> <sub>1</sub>) were extracted using Praat and a custom MATLAB algorithm. Individual mixed-effects logistic regression models were conducted to analyze the relationships between acoustic measures and transcription accuracy (accurate vs. inaccurate) for voiced and voiceless plosives separately.</p><p><strong>Results: </strong>There were no significant differences between inaccuracy rates for voiced and voiceless plosive productions, nor were acoustic measures predictive of speech-to-text inaccuracy. However, age and dialect density were significantly related to voiceless plosive accuracy.</p><p><strong>Conclusions: </strong>The complexities of voice, motor and articulatory development within children can be characterized by acoustic measures. These measures inform acoustic algorithms created for speech technology. Research on acoustic measures in young child AAE speech, with considerations for dialect variability and age, will enhance speech recognition technology and clinical best practices.</p>\",\"PeriodicalId\":74424,\"journal\":{\"name\":\"Perspectives of the ASHA special interest groups\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12490741/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Perspectives of the ASHA special interest groups\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1044/2025_persp-25-00052\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Perspectives of the ASHA special interest groups","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1044/2025_persp-25-00052","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Demographic and Acoustic Factors related to Automatic Speech Recognition Inaccuracies for Child African American English Speakers.
Purpose: This study investigated the relationship between acoustic measures and Google's Speech-to-Text inaccuracies in recognizing speech of children ages 4-9 years who speak African American English (AAE).
Methods: Audio recordings were collected from 11 AAE speaking children with speech stimuli targeting final plosive variations observed within the AAE dialect. Dialectal density was measured using the Diagnostic Evaluation of Language Variation Language Screener. Recordings were transcribed using Google's Speech-to-Text application (Google Voice) and inaccuracies were determined through comparison to researcher extracted transcriptions. Acoustic measures from vowels preceding final plosives (including vowel duration, fundamental frequency, average F1) were extracted using Praat and a custom MATLAB algorithm. Individual mixed-effects logistic regression models were conducted to analyze the relationships between acoustic measures and transcription accuracy (accurate vs. inaccurate) for voiced and voiceless plosives separately.
Results: There were no significant differences between inaccuracy rates for voiced and voiceless plosive productions, nor were acoustic measures predictive of speech-to-text inaccuracy. However, age and dialect density were significantly related to voiceless plosive accuracy.
Conclusions: The complexities of voice, motor and articulatory development within children can be characterized by acoustic measures. These measures inform acoustic algorithms created for speech technology. Research on acoustic measures in young child AAE speech, with considerations for dialect variability and age, will enhance speech recognition technology and clinical best practices.