COVID-19 pneumonia chest radiographic severity score: Variability assessment among experienced and In-training radiologists and creation of a Multi-reader composite score database for artificial intelligence algorithm development.
M. van Assen, Mohammadreza Zandehshahvar, H. Maleki, Y. Kiarashi, T. Arleo, A. Stillman, Peter D Filev, A. Davarpanah, E. Berkowitz, S. Tigges, Scott J. Lee, B. Vey, A. Adibi, C. D. De Cecco
{"title":"COVID-19 pneumonia chest radiographic severity score: Variability assessment among experienced and In-training radiologists and creation of a Multi-reader composite score database for artificial intelligence algorithm development.","authors":"M. van Assen, Mohammadreza Zandehshahvar, H. Maleki, Y. Kiarashi, T. Arleo, A. Stillman, Peter D Filev, A. Davarpanah, E. Berkowitz, S. Tigges, Scott J. Lee, B. Vey, A. Adibi, C. D. De Cecco","doi":"10.1259/bjr.20211028","DOIUrl":null,"url":null,"abstract":"OBJECTIVE\nThe purpose was to evaluate reader variability between experienced and in-training radiologists of COVID-19 pneumonia severity on CXR, and to create a multi reader database suitable for AI development.\n\n\nMETHODS\nIn this study, CXRs from PCR positive COVID-19 patients were reviewed. Six experienced cardiothoracic radiologists and two residents classified each CXR according to severity. One radiologist performed the classification twice to assess intra observer variability. Severity classification was assessed using a four-class system: normal(0), mild, moderate, and severe. A median severity score (Rad Med) for each CXR was determined for the six radiologists for development of a multi reader database (XCOMS). Kendal Tau correlation and percentage of disagreement were calculated to assess variability.\n\n\nRESULTS\nA total of 397 patients (1208 CXRs) were included (mean age, 60 years SD ±1), 189 men). Inter observer variability between the radiologists ranges between 0.67-0.78. Compared to the Rad Med score, the radiologists show good correlation between 0.79-0.88. Residents show slightly lower inter observer agreement of 0.66 with each other and between 0.69-0.71 with experienced radiologists. Intra observer agreement was high with a correlation coefficient of 0.77. In 220 (18%), 707 (59%), 259 (21%) and 22 (2%) CXRs there was a 0, 1, two or three class-difference. In 594 (50%) CXRs the median scores of the residents and the radiologists were similar, in 578 (48%) and 36 (3%) CXRs there was a 1 and 2 class-difference.\n\n\nCONCLUSION\nExperienced and in-training radiologists demonstrate good inter and intra observer agreement in COVID-19 pneumonia severity classification. A higher percentage of disagreement was observed in moderate cases, which may affect training of AI algorithms.\n\n\nADVANCES IN KNOWLEDGE\nMost AI algorithms are trained on data labeled by a single expert. This study shows that for COVID-19 X-ray severity classification there is significant variability and disagreement between radiologist and between residents.","PeriodicalId":226783,"journal":{"name":"The British journal of radiology","volume":"236 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The British journal of radiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1259/bjr.20211028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
OBJECTIVE
The purpose was to evaluate reader variability between experienced and in-training radiologists of COVID-19 pneumonia severity on CXR, and to create a multi reader database suitable for AI development.
METHODS
In this study, CXRs from PCR positive COVID-19 patients were reviewed. Six experienced cardiothoracic radiologists and two residents classified each CXR according to severity. One radiologist performed the classification twice to assess intra observer variability. Severity classification was assessed using a four-class system: normal(0), mild, moderate, and severe. A median severity score (Rad Med) for each CXR was determined for the six radiologists for development of a multi reader database (XCOMS). Kendal Tau correlation and percentage of disagreement were calculated to assess variability.
RESULTS
A total of 397 patients (1208 CXRs) were included (mean age, 60 years SD ±1), 189 men). Inter observer variability between the radiologists ranges between 0.67-0.78. Compared to the Rad Med score, the radiologists show good correlation between 0.79-0.88. Residents show slightly lower inter observer agreement of 0.66 with each other and between 0.69-0.71 with experienced radiologists. Intra observer agreement was high with a correlation coefficient of 0.77. In 220 (18%), 707 (59%), 259 (21%) and 22 (2%) CXRs there was a 0, 1, two or three class-difference. In 594 (50%) CXRs the median scores of the residents and the radiologists were similar, in 578 (48%) and 36 (3%) CXRs there was a 1 and 2 class-difference.
CONCLUSION
Experienced and in-training radiologists demonstrate good inter and intra observer agreement in COVID-19 pneumonia severity classification. A higher percentage of disagreement was observed in moderate cases, which may affect training of AI algorithms.
ADVANCES IN KNOWLEDGE
Most AI algorithms are trained on data labeled by a single expert. This study shows that for COVID-19 X-ray severity classification there is significant variability and disagreement between radiologist and between residents.