临床技能客观结构化临床考试中评分者间信度的评价

IF 0.6 Q4 HEALTH CARE SCIENCES & SERVICES

African Journal of Health Professions Education Pub Date : 2023-05-18 DOI:10.7196/ajhpe.2023.v15i2.1574

V. De Beer, J. Nel, FP Pieterse, A. Snyman, G. Joubert, M. Labuschagne

{"title":"临床技能客观结构化临床考试中评分者间信度的评价","authors":"V. De Beer, J. Nel, FP Pieterse, A. Snyman, G. Joubert, M. Labuschagne","doi":"10.7196/ajhpe.2023.v15i2.1574","DOIUrl":null,"url":null,"abstract":"Background. An objective structured clinical examination (OSCE) is a performance-based examination used to assess health sciences students and is awell-recognised tool to assess clinical skills with or without using real patients.Objectives. To determine the inter-rater reliability of experienced and novice assessors from different clinical backgrounds on the final mark allocationsduring assessment of third-year medical students’ final OSCE at the University of the Free State.Methods. This cross-sectional analytical study included 24 assessors and 145 students. After training and written instructions, two assessors per station(urology history taking, respiratory examination and gynaecology skills assessment) each independently assessed the same student for the same skill bycompleting their individual checklists. At each station, assessors could also give a global rating mark (from 1 to 5) as an overall impression.Results. The urology history-taking station had the lowest mean score (53.4%) and the gynaecology skills station the highest (71.1%). Seven (58.3%) ofthe 12 assessor pairs differed by >5% regarding the final mark, with differences ranging from 5.2% to 12.2%. For two pairs the entire confidence interval(CI) was within the 5% range, whereas for five pairs the entire CI was outside the 5% range. Only one pair achieved substantial agreement (weightedkappa statistic 0.74 ‒ urology history taking). There was no consistency within or across stations regarding whether the experienced or novice assessorgave higher marks. For the respiratory examination and gynaecology skills stations, all pairs differed for the majority of students regarding the globalrating mark. Weighted kappa statistics indicated that no pair achieved substantial agreement regarding this mark.Conclusion. Despite previous experience, written instructions and training in the use of the checklists, differences between assessors were found inmost cases.","PeriodicalId":43683,"journal":{"name":"African Journal of Health Professions Education","volume":" ","pages":""},"PeriodicalIF":0.6000,"publicationDate":"2023-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An evaluation of the inter-rater reliability in a clinical skills objective structured clinical examination\",\"authors\":\"V. De Beer, J. Nel, FP Pieterse, A. Snyman, G. Joubert, M. Labuschagne\",\"doi\":\"10.7196/ajhpe.2023.v15i2.1574\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background. An objective structured clinical examination (OSCE) is a performance-based examination used to assess health sciences students and is awell-recognised tool to assess clinical skills with or without using real patients.Objectives. To determine the inter-rater reliability of experienced and novice assessors from different clinical backgrounds on the final mark allocationsduring assessment of third-year medical students’ final OSCE at the University of the Free State.Methods. This cross-sectional analytical study included 24 assessors and 145 students. After training and written instructions, two assessors per station(urology history taking, respiratory examination and gynaecology skills assessment) each independently assessed the same student for the same skill bycompleting their individual checklists. At each station, assessors could also give a global rating mark (from 1 to 5) as an overall impression.Results. The urology history-taking station had the lowest mean score (53.4%) and the gynaecology skills station the highest (71.1%). Seven (58.3%) ofthe 12 assessor pairs differed by >5% regarding the final mark, with differences ranging from 5.2% to 12.2%. For two pairs the entire confidence interval(CI) was within the 5% range, whereas for five pairs the entire CI was outside the 5% range. Only one pair achieved substantial agreement (weightedkappa statistic 0.74 ‒ urology history taking). There was no consistency within or across stations regarding whether the experienced or novice assessorgave higher marks. For the respiratory examination and gynaecology skills stations, all pairs differed for the majority of students regarding the globalrating mark. Weighted kappa statistics indicated that no pair achieved substantial agreement regarding this mark.Conclusion. Despite previous experience, written instructions and training in the use of the checklists, differences between assessors were found inmost cases.\",\"PeriodicalId\":43683,\"journal\":{\"name\":\"African Journal of Health Professions Education\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2023-05-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"African Journal of Health Professions Education\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.7196/ajhpe.2023.v15i2.1574\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"African Journal of Health Professions Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7196/ajhpe.2023.v15i2.1574","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

背景客观结构化临床考试（OSCE）是一种基于绩效的考试，用于评估健康科学专业的学生，是一种公认的工具，用于评估是否使用真实患者的临床技能。目标。为了确定来自不同临床背景的经验丰富和新手评估员在自由邦大学医学生三年级最后一次OSCE评估中对最终分数分配的评分者间可靠性。方法。这项横断面分析研究包括24名评估员和145名学生。在培训和书面指示后，每个工作站的两名评估员（泌尿病史采集、呼吸系统检查和妇科技能评估）分别通过完成各自的检查表，对同一名学生的相同技能进行独立评估。在每个站点，评估员还可以给出一个全局评分（从1到5）作为总体印象。后果泌尿科病史采集站的平均得分最低（53.4%），妇科技能站的平均分最高（71.1%）。12对评估员中有7对（58.3%）的最终得分差异大于5%，差异范围从5.2%到12.2%。两对的整个置信区间（CI）在5%范围内，而五对的整个CI在5%范围外。只有一对获得了实质性的一致性（加权kappa统计0.74-泌尿外科病史）。对于经验丰富的或新手的评估结果是否更高，各站内部或各站之间没有一致性。对于呼吸系统检查和妇科技能站，大多数学生的所有配对在全球评分方面都有所不同。加权kappa统计表明，没有任何一对在这一标记方面达成实质性一致。结论尽管之前有使用检查表的经验、书面说明和培训，但在大多数情况下，评估员之间存在差异。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An evaluation of the inter-rater reliability in a clinical skills objective structured clinical examination

Background. An objective structured clinical examination (OSCE) is a performance-based examination used to assess health sciences students and is awell-recognised tool to assess clinical skills with or without using real patients.Objectives. To determine the inter-rater reliability of experienced and novice assessors from different clinical backgrounds on the final mark allocationsduring assessment of third-year medical students’ final OSCE at the University of the Free State.Methods. This cross-sectional analytical study included 24 assessors and 145 students. After training and written instructions, two assessors per station(urology history taking, respiratory examination and gynaecology skills assessment) each independently assessed the same student for the same skill bycompleting their individual checklists. At each station, assessors could also give a global rating mark (from 1 to 5) as an overall impression.Results. The urology history-taking station had the lowest mean score (53.4%) and the gynaecology skills station the highest (71.1%). Seven (58.3%) ofthe 12 assessor pairs differed by >5% regarding the final mark, with differences ranging from 5.2% to 12.2%. For two pairs the entire confidence interval(CI) was within the 5% range, whereas for five pairs the entire CI was outside the 5% range. Only one pair achieved substantial agreement (weightedkappa statistic 0.74 ‒ urology history taking). There was no consistency within or across stations regarding whether the experienced or novice assessorgave higher marks. For the respiratory examination and gynaecology skills stations, all pairs differed for the majority of students regarding the globalrating mark. Weighted kappa statistics indicated that no pair achieved substantial agreement regarding this mark.Conclusion. Despite previous experience, written instructions and training in the use of the checklists, differences between assessors were found inmost cases.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

African Journal of Health Professions Education HEALTH CARE SCIENCES & SERVICES-

自引率

0.00%

发文量

审稿时长

24 weeks