{"title":"Reliability and Task Effects in CAPE-V Auditory-Perceptual Voice Assessments: Insights From the PVQD<sub>30</sub> Subset.","authors":"Timothy Pommée, Sara-Eve Renaud, Ingrid Verduyckt","doi":"10.1016/j.jvoice.2025.02.020","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>This study aimed to evaluate the inter- and intra-rater reliability of consensus auditory-perceptual evaluation of voice (CAPE-V) auditory-perceptual ratings and explore task-specific differences (sustained vowels versus sentences) in ratings and reliability.</p><p><strong>Study design: </strong>Cross-sectional reliability study using a curated subset of dysphonic voice samples (PVQD<sub>30</sub>).</p><p><strong>Methods: </strong>Thirty voice samples representing varying dysphonia severities were selected from the Perceptual Voice Qualities Database. Eight Quebecois speech-language pathologists (SLPs) rated the samples using the CAPE-V protocol on the Bridge2Practice platform. Ratings included six vocal features on a visual analog scale (VAS) and binary consistency (C/I) judgments. Reliability was assessed using intra-class correlation coefficients (ICCs) for VAS ratings and Gwet's AC1 for C/I ratings. Task effects were analyzed using Wilcoxon signed-rank tests and Spearman correlations.</p><p><strong>Results: </strong>Overall severity ratings demonstrated good inter-rater reliability for both vowels (ICC = 0.79) and sentences (ICC = 0.87). Pitch and loudness ratings showed low inter-rater reliability (ICCs < 0.5) across tasks. Vowels were rated as more impaired for most features, except strain, which showed higher impairment on sentences. Inter-rater reliability was higher for roughness and breathiness on vowels, whereas strain showed better reliability on sentences. Intra-rater reliability was consistently higher on sentences for all features (ICCs > 0.75 for most). Consistency ratings were more reliable on vowels than sentences for most features, except loudness.</p><p><strong>Conclusions: </strong>Task type significantly impacts CAPE-V ratings and their reliability. Vowels provided higher inter-rater reliability for roughness and breathiness, while sentences yielded better intra-rater consistency and strain reliability. These findings highlight the need for ongoing refinement of assessment tools and training protocols to ensure accurate and reliable voice evaluations.</p>","PeriodicalId":49954,"journal":{"name":"Journal of Voice","volume":" ","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Voice","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jvoice.2025.02.020","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: This study aimed to evaluate the inter- and intra-rater reliability of consensus auditory-perceptual evaluation of voice (CAPE-V) auditory-perceptual ratings and explore task-specific differences (sustained vowels versus sentences) in ratings and reliability.
Study design: Cross-sectional reliability study using a curated subset of dysphonic voice samples (PVQD30).
Methods: Thirty voice samples representing varying dysphonia severities were selected from the Perceptual Voice Qualities Database. Eight Quebecois speech-language pathologists (SLPs) rated the samples using the CAPE-V protocol on the Bridge2Practice platform. Ratings included six vocal features on a visual analog scale (VAS) and binary consistency (C/I) judgments. Reliability was assessed using intra-class correlation coefficients (ICCs) for VAS ratings and Gwet's AC1 for C/I ratings. Task effects were analyzed using Wilcoxon signed-rank tests and Spearman correlations.
Results: Overall severity ratings demonstrated good inter-rater reliability for both vowels (ICC = 0.79) and sentences (ICC = 0.87). Pitch and loudness ratings showed low inter-rater reliability (ICCs < 0.5) across tasks. Vowels were rated as more impaired for most features, except strain, which showed higher impairment on sentences. Inter-rater reliability was higher for roughness and breathiness on vowels, whereas strain showed better reliability on sentences. Intra-rater reliability was consistently higher on sentences for all features (ICCs > 0.75 for most). Consistency ratings were more reliable on vowels than sentences for most features, except loudness.
Conclusions: Task type significantly impacts CAPE-V ratings and their reliability. Vowels provided higher inter-rater reliability for roughness and breathiness, while sentences yielded better intra-rater consistency and strain reliability. These findings highlight the need for ongoing refinement of assessment tools and training protocols to ensure accurate and reliable voice evaluations.
期刊介绍:
The Journal of Voice is widely regarded as the world''s premiere journal for voice medicine and research. This peer-reviewed publication is listed in Index Medicus and is indexed by the Institute for Scientific Information. The journal contains articles written by experts throughout the world on all topics in voice sciences, voice medicine and surgery, and speech-language pathologists'' management of voice-related problems. The journal includes clinical articles, clinical research, and laboratory research. Members of the Foundation receive the journal as a benefit of membership.