Sriram Ramgopal, Jennifer K Saper, James R Rudloff, Alexandra T Geanacopoulos, Andrea Rivera-Sepulveda, Theresa Timm, Deborah R Liu, Jane K Soung, Lichuan Liu, Todd A Florin
{"title":"Interrater Reliability of Pediatric Respiratory Auscultation Findings.","authors":"Sriram Ramgopal, Jennifer K Saper, James R Rudloff, Alexandra T Geanacopoulos, Andrea Rivera-Sepulveda, Theresa Timm, Deborah R Liu, Jane K Soung, Lichuan Liu, Todd A Florin","doi":"10.1542/hpeds.2025-008510","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To evaluate the interrater reliability of pediatric auscultatory findings as assessed by pediatric emergency medicine (PEM) physicians.</p><p><strong>Methods: </strong>We conducted a multicenter survey of physicians in 6 academic PEM divisions in the United States. Respondents listened to 15 audio clips of pediatric auscultatory sounds and classified them as normal or as having 1 or more adventitious sounds. We calculated Fleiss' κ to evaluate interrater reliability of auscultatory findings among respondents. We stratified results based on study site and years of experience.</p><p><strong>Results: </strong>Surveys were distributed to 128 physicians, with responses from 106 (83% response rate). Only the identification of normal breath sounds (κ = 0.46, 95% CI, 0.45-0.47) met threshold criteria for reliability. Other findings did not reach this threshold, including stridor (κ = 0.32, 95% CI, 0.31-0.33), wheeze (κ = 0.25, 95% CI, 0.24-0.25), crackles (κ = 0.15, 95% CI, 0.15-0.16), and rhonchi (κ = 0.15, 95% CI, 0.14-0.15). Some sites demonstrated greater intrarater reliability compared with others. Stratified by years of experience, only interpretation of normal breath sounds and stridor among physicians with 0 to 4 years of experience and the interpretation of normal breath sounds among physicians with 15 or more years of experience reached acceptable reliability. Compared with a reference standard, highest accuracy was noted in the interpretation of normal breath sounds (Accuracy = 0.85, 95% CI, 0.83-0.87).</p><p><strong>Conclusion: </strong>We found poor interrater reliability in the interpretation of most pediatric breath sounds, except in the identification of normal breath sounds. These findings support a need for more robust approaches toward the accurate identification of respiratory pathology in children.</p>","PeriodicalId":38180,"journal":{"name":"Hospital pediatrics","volume":" ","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Hospital pediatrics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1542/hpeds.2025-008510","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Nursing","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: To evaluate the interrater reliability of pediatric auscultatory findings as assessed by pediatric emergency medicine (PEM) physicians.
Methods: We conducted a multicenter survey of physicians in 6 academic PEM divisions in the United States. Respondents listened to 15 audio clips of pediatric auscultatory sounds and classified them as normal or as having 1 or more adventitious sounds. We calculated Fleiss' κ to evaluate interrater reliability of auscultatory findings among respondents. We stratified results based on study site and years of experience.
Results: Surveys were distributed to 128 physicians, with responses from 106 (83% response rate). Only the identification of normal breath sounds (κ = 0.46, 95% CI, 0.45-0.47) met threshold criteria for reliability. Other findings did not reach this threshold, including stridor (κ = 0.32, 95% CI, 0.31-0.33), wheeze (κ = 0.25, 95% CI, 0.24-0.25), crackles (κ = 0.15, 95% CI, 0.15-0.16), and rhonchi (κ = 0.15, 95% CI, 0.14-0.15). Some sites demonstrated greater intrarater reliability compared with others. Stratified by years of experience, only interpretation of normal breath sounds and stridor among physicians with 0 to 4 years of experience and the interpretation of normal breath sounds among physicians with 15 or more years of experience reached acceptable reliability. Compared with a reference standard, highest accuracy was noted in the interpretation of normal breath sounds (Accuracy = 0.85, 95% CI, 0.83-0.87).
Conclusion: We found poor interrater reliability in the interpretation of most pediatric breath sounds, except in the identification of normal breath sounds. These findings support a need for more robust approaches toward the accurate identification of respiratory pathology in children.