{"title":"Interrater Reliability at the Top End: Measures of Pilots’ Nontechnical Performance","authors":"Patrick Gontar, Hans-Juergen Hoermann","doi":"10.1080/10508414.2015.1162636","DOIUrl":null,"url":null,"abstract":"Objective: The aim of this study was to analyze influences on interrater reliability and within-group agreement within a highly experienced rater group when assessing pilots’ nontechnical skills. Background: Nontechnical skills of pilots are crucial for the conduct of safe flight operations. To train and assess these skills, reliable expert ratings are required. Literature shows to some degree that interrater reliability is influenced by factors related to the targets, scenarios, rating tools, or the raters themselves. Method: Thirty-seven type-rating examiners from a European airline assessed the performance of 4 flight crews based on video recordings using LOSA and adapted NOTECHS tools. We calculated rwg and ICC(3) to measure within-group agreement and interrater reliability. Results: The findings indicated that within-group agreement and interrater reliability were not always acceptable. It was shown that the performance of outstanding pilots was rated with the highest within-group agreement. For cognitive aspects of performance, interrater reliability was higher than for social aspects of performance. Agreement was lower on the pass–fail level than for the distinguished performance scales. Conclusion: These results suggest pass–fail decisions should not be based exclusively on nontechnical skill ratings. We furthermore recommend that regulatory authorities more systematically address interrater reliability in airline instructor training. Airlines as well as training facilities should be encouraged to demonstrate sufficient interrater reliability when using their rating tools.","PeriodicalId":83071,"journal":{"name":"The International journal of aviation psychology","volume":"25 1","pages":"171 - 190"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10508414.2015.1162636","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The International journal of aviation psychology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/10508414.2015.1162636","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
Objective: The aim of this study was to analyze influences on interrater reliability and within-group agreement within a highly experienced rater group when assessing pilots’ nontechnical skills. Background: Nontechnical skills of pilots are crucial for the conduct of safe flight operations. To train and assess these skills, reliable expert ratings are required. Literature shows to some degree that interrater reliability is influenced by factors related to the targets, scenarios, rating tools, or the raters themselves. Method: Thirty-seven type-rating examiners from a European airline assessed the performance of 4 flight crews based on video recordings using LOSA and adapted NOTECHS tools. We calculated rwg and ICC(3) to measure within-group agreement and interrater reliability. Results: The findings indicated that within-group agreement and interrater reliability were not always acceptable. It was shown that the performance of outstanding pilots was rated with the highest within-group agreement. For cognitive aspects of performance, interrater reliability was higher than for social aspects of performance. Agreement was lower on the pass–fail level than for the distinguished performance scales. Conclusion: These results suggest pass–fail decisions should not be based exclusively on nontechnical skill ratings. We furthermore recommend that regulatory authorities more systematically address interrater reliability in airline instructor training. Airlines as well as training facilities should be encouraged to demonstrate sufficient interrater reliability when using their rating tools.