Interrater Reliability at the Top End: Measures of Pilots’ Nontechnical Performance

The International journal of aviation psychology Pub Date : 2015-10-02 DOI:10.1080/10508414.2015.1162636

Patrick Gontar, Hans-Juergen Hoermann

{"title":"Interrater Reliability at the Top End: Measures of Pilots’ Nontechnical Performance","authors":"Patrick Gontar, Hans-Juergen Hoermann","doi":"10.1080/10508414.2015.1162636","DOIUrl":null,"url":null,"abstract":"Objective: The aim of this study was to analyze influences on interrater reliability and within-group agreement within a highly experienced rater group when assessing pilots’ nontechnical skills. Background: Nontechnical skills of pilots are crucial for the conduct of safe flight operations. To train and assess these skills, reliable expert ratings are required. Literature shows to some degree that interrater reliability is influenced by factors related to the targets, scenarios, rating tools, or the raters themselves. Method: Thirty-seven type-rating examiners from a European airline assessed the performance of 4 flight crews based on video recordings using LOSA and adapted NOTECHS tools. We calculated rwg and ICC(3) to measure within-group agreement and interrater reliability. Results: The findings indicated that within-group agreement and interrater reliability were not always acceptable. It was shown that the performance of outstanding pilots was rated with the highest within-group agreement. For cognitive aspects of performance, interrater reliability was higher than for social aspects of performance. Agreement was lower on the pass–fail level than for the distinguished performance scales. Conclusion: These results suggest pass–fail decisions should not be based exclusively on nontechnical skill ratings. We furthermore recommend that regulatory authorities more systematically address interrater reliability in airline instructor training. Airlines as well as training facilities should be encouraged to demonstrate sufficient interrater reliability when using their rating tools.","PeriodicalId":83071,"journal":{"name":"The International journal of aviation psychology","volume":"25 1","pages":"171 - 190"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10508414.2015.1162636","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The International journal of aviation psychology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/10508414.2015.1162636","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

Objective: The aim of this study was to analyze influences on interrater reliability and within-group agreement within a highly experienced rater group when assessing pilots’ nontechnical skills. Background: Nontechnical skills of pilots are crucial for the conduct of safe flight operations. To train and assess these skills, reliable expert ratings are required. Literature shows to some degree that interrater reliability is influenced by factors related to the targets, scenarios, rating tools, or the raters themselves. Method: Thirty-seven type-rating examiners from a European airline assessed the performance of 4 flight crews based on video recordings using LOSA and adapted NOTECHS tools. We calculated rwg and ICC(3) to measure within-group agreement and interrater reliability. Results: The findings indicated that within-group agreement and interrater reliability were not always acceptable. It was shown that the performance of outstanding pilots was rated with the highest within-group agreement. For cognitive aspects of performance, interrater reliability was higher than for social aspects of performance. Agreement was lower on the pass–fail level than for the distinguished performance scales. Conclusion: These results suggest pass–fail decisions should not be based exclusively on nontechnical skill ratings. We furthermore recommend that regulatory authorities more systematically address interrater reliability in airline instructor training. Airlines as well as training facilities should be encouraged to demonstrate sufficient interrater reliability when using their rating tools.

查看原文本刊更多论文

最高端的互译器可靠性:飞行员非技术表现的测量

目的:本研究的目的是分析一个经验丰富的评价员群体在评估飞行员非技术技能时对评价员信度和组内一致性的影响。背景:飞行员的非技术技能对安全飞行操作的进行至关重要。为了训练和评估这些技能，需要可靠的专家评级。文献表明，在某种程度上，互估者的信度受到与目标、情景、评级工具或评级者本身相关的因素的影响。方法:来自欧洲一家航空公司的37名类型评级审查员使用LOSA和改编的NOTECHS工具根据视频记录评估了4名机组人员的表现。我们计算了rwg和ICC(3)来衡量组内一致性和组间可靠性。结果:研究结果表明，组内一致性和研究者之间的信度并不总是可以接受的。结果显示，优秀飞行员的表现得到了最高的团体内部认同。对于认知方面的表现，判读者的信度高于社会方面的表现。在及格-不及格水平上的一致性低于杰出表现量表。结论:这些结果表明，通过与否的决定不应该完全基于非技术技能评级。我们进一步建议监管当局在航空教练培训中更系统地解决解释器可靠性问题。应鼓励航空公司和培训机构在使用其评级工具时展示足够的互判员可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The International journal of aviation psychology

自引率

0.00%

发文量