Rebecca S. Gates MD, MMHPE , Andrew E. Krumm PhD , Olle ten Cate PhD , Xilin Chen MPH , Kayla Marcotte MS , Angela E. Thelen MD , Shanley B. Deal MD , Adnan Alseidi MD, Ed.M , David Swanson PhD , Brian C. George MD, MAEd
{"title":"How Reliable are Single-Question Workplace-Based Assessments in Surgery?","authors":"Rebecca S. Gates MD, MMHPE , Andrew E. Krumm PhD , Olle ten Cate PhD , Xilin Chen MPH , Kayla Marcotte MS , Angela E. Thelen MD , Shanley B. Deal MD , Adnan Alseidi MD, Ed.M , David Swanson PhD , Brian C. George MD, MAEd","doi":"10.1016/j.jsurg.2024.03.015","DOIUrl":null,"url":null,"abstract":"<div><h3>OBJECTIVE</h3><p>Workplace-based assessments (WBAs) play an important role in the assessment of surgical trainees. Because these assessment tools are utilized by a multitude of faculty, inter-rater reliability is important to consider when interpreting WBA data. Although there is evidence supporting the validity of many of these tools, inter-reliability evidence is lacking. This study aimed to evaluate the inter-rater reliability of multiple operative WBA tools utilized in general surgery residency.</p></div><div><h3>DESIGN</h3><p>General surgery residents and teaching faculty were recorded during 6 general surgery operations. Nine faculty raters each reviewed 6 videos and rated each resident on performance (using the Society for Improving Medical Professional Learning, or SIMPL, Performance Scale as well as the operative performance rating system (OPRS) Scale), entrustment (using the ten Cate Entrustment-Supervision Scale), and autonomy (using the Zwisch Scale). The ratings were reviewed for inter-rater reliability using percent agreement and intraclass correlations.</p></div><div><h3>PARTICIPANTS</h3><p>Nine faculty members viewed the videos and assigned ratings for multiple WBAs.</p></div><div><h3>RESULTS</h3><p>Absolute intraclass correlation coefficients for each scale ranged from 0.33 to 0.47.</p></div><div><h3>CONCLUSIONS</h3><p>All single-item WBA scales had low to moderate inter-rater reliability. While rater training may improve inter-rater reliability for single observations, many observations by many raters are needed to reliably assess trainee performance in the workplace.</p></div>","PeriodicalId":50033,"journal":{"name":"Journal of Surgical Education","volume":null,"pages":null},"PeriodicalIF":2.6000,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Surgical Education","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1931720424001612","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 0
Abstract
OBJECTIVE
Workplace-based assessments (WBAs) play an important role in the assessment of surgical trainees. Because these assessment tools are utilized by a multitude of faculty, inter-rater reliability is important to consider when interpreting WBA data. Although there is evidence supporting the validity of many of these tools, inter-reliability evidence is lacking. This study aimed to evaluate the inter-rater reliability of multiple operative WBA tools utilized in general surgery residency.
DESIGN
General surgery residents and teaching faculty were recorded during 6 general surgery operations. Nine faculty raters each reviewed 6 videos and rated each resident on performance (using the Society for Improving Medical Professional Learning, or SIMPL, Performance Scale as well as the operative performance rating system (OPRS) Scale), entrustment (using the ten Cate Entrustment-Supervision Scale), and autonomy (using the Zwisch Scale). The ratings were reviewed for inter-rater reliability using percent agreement and intraclass correlations.
PARTICIPANTS
Nine faculty members viewed the videos and assigned ratings for multiple WBAs.
RESULTS
Absolute intraclass correlation coefficients for each scale ranged from 0.33 to 0.47.
CONCLUSIONS
All single-item WBA scales had low to moderate inter-rater reliability. While rater training may improve inter-rater reliability for single observations, many observations by many raters are needed to reliably assess trainee performance in the workplace.
期刊介绍:
The Journal of Surgical Education (JSE) is dedicated to advancing the field of surgical education through original research. The journal publishes research articles in all surgical disciplines on topics relative to the education of surgical students, residents, and fellows, as well as practicing surgeons. Our readers look to JSE for timely, innovative research findings from the international surgical education community. As the official journal of the Association of Program Directors in Surgery (APDS), JSE publishes the proceedings of the annual APDS meeting held during Surgery Education Week.