COVID-19 pneumonia chest radiographic severity score: Variability assessment among experienced and In-training radiologists and creation of a Multi-reader composite score database for artificial intelligence algorithm development.

The British journal of radiology Pub Date : 2022-04-22 DOI:10.1259/bjr.20211028

M. van Assen, Mohammadreza Zandehshahvar, H. Maleki, Y. Kiarashi, T. Arleo, A. Stillman, Peter D Filev, A. Davarpanah, E. Berkowitz, S. Tigges, Scott J. Lee, B. Vey, A. Adibi, C. D. De Cecco

{"title":"COVID-19 pneumonia chest radiographic severity score: Variability assessment among experienced and In-training radiologists and creation of a Multi-reader composite score database for artificial intelligence algorithm development.","authors":"M. van Assen, Mohammadreza Zandehshahvar, H. Maleki, Y. Kiarashi, T. Arleo, A. Stillman, Peter D Filev, A. Davarpanah, E. Berkowitz, S. Tigges, Scott J. Lee, B. Vey, A. Adibi, C. D. De Cecco","doi":"10.1259/bjr.20211028","DOIUrl":null,"url":null,"abstract":"OBJECTIVE\nThe purpose was to evaluate reader variability between experienced and in-training radiologists of COVID-19 pneumonia severity on CXR, and to create a multi reader database suitable for AI development.\n\n\nMETHODS\nIn this study, CXRs from PCR positive COVID-19 patients were reviewed. Six experienced cardiothoracic radiologists and two residents classified each CXR according to severity. One radiologist performed the classification twice to assess intra observer variability. Severity classification was assessed using a four-class system: normal(0), mild, moderate, and severe. A median severity score (Rad Med) for each CXR was determined for the six radiologists for development of a multi reader database (XCOMS). Kendal Tau correlation and percentage of disagreement were calculated to assess variability.\n\n\nRESULTS\nA total of 397 patients (1208 CXRs) were included (mean age, 60 years SD ±1), 189 men). Inter observer variability between the radiologists ranges between 0.67-0.78. Compared to the Rad Med score, the radiologists show good correlation between 0.79-0.88. Residents show slightly lower inter observer agreement of 0.66 with each other and between 0.69-0.71 with experienced radiologists. Intra observer agreement was high with a correlation coefficient of 0.77. In 220 (18%), 707 (59%), 259 (21%) and 22 (2%) CXRs there was a 0, 1, two or three class-difference. In 594 (50%) CXRs the median scores of the residents and the radiologists were similar, in 578 (48%) and 36 (3%) CXRs there was a 1 and 2 class-difference.\n\n\nCONCLUSION\nExperienced and in-training radiologists demonstrate good inter and intra observer agreement in COVID-19 pneumonia severity classification. A higher percentage of disagreement was observed in moderate cases, which may affect training of AI algorithms.\n\n\nADVANCES IN KNOWLEDGE\nMost AI algorithms are trained on data labeled by a single expert. This study shows that for COVID-19 X-ray severity classification there is significant variability and disagreement between radiologist and between residents.","PeriodicalId":226783,"journal":{"name":"The British journal of radiology","volume":"236 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The British journal of radiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1259/bjr.20211028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

OBJECTIVE The purpose was to evaluate reader variability between experienced and in-training radiologists of COVID-19 pneumonia severity on CXR, and to create a multi reader database suitable for AI development. METHODS In this study, CXRs from PCR positive COVID-19 patients were reviewed. Six experienced cardiothoracic radiologists and two residents classified each CXR according to severity. One radiologist performed the classification twice to assess intra observer variability. Severity classification was assessed using a four-class system: normal(0), mild, moderate, and severe. A median severity score (Rad Med) for each CXR was determined for the six radiologists for development of a multi reader database (XCOMS). Kendal Tau correlation and percentage of disagreement were calculated to assess variability. RESULTS A total of 397 patients (1208 CXRs) were included (mean age, 60 years SD ±1), 189 men). Inter observer variability between the radiologists ranges between 0.67-0.78. Compared to the Rad Med score, the radiologists show good correlation between 0.79-0.88. Residents show slightly lower inter observer agreement of 0.66 with each other and between 0.69-0.71 with experienced radiologists. Intra observer agreement was high with a correlation coefficient of 0.77. In 220 (18%), 707 (59%), 259 (21%) and 22 (2%) CXRs there was a 0, 1, two or three class-difference. In 594 (50%) CXRs the median scores of the residents and the radiologists were similar, in 578 (48%) and 36 (3%) CXRs there was a 1 and 2 class-difference. CONCLUSION Experienced and in-training radiologists demonstrate good inter and intra observer agreement in COVID-19 pneumonia severity classification. A higher percentage of disagreement was observed in moderate cases, which may affect training of AI algorithms. ADVANCES IN KNOWLEDGE Most AI algorithms are trained on data labeled by a single expert. This study shows that for COVID-19 X-ray severity classification there is significant variability and disagreement between radiologist and between residents.

查看原文本刊更多论文

COVID-19肺炎胸片严重程度评分:经验丰富和在职放射科医生之间的可变性评估以及为人工智能算法开发创建多阅读器复合评分数据库。

目的评估经验丰富的放射科医师和在职放射科医师在CXR上对COVID-19肺炎严重程度的解读差异，建立适合人工智能开发的多解读数据库。方法回顾性分析PCR阳性COVID-19患者的cxr。六名经验丰富的心胸放射科医生和两名住院医师根据严重程度对每个CXR进行分类。一位放射科医生进行了两次分类，以评估观察者内部的可变性。严重程度分级采用4级系统进行评估:正常(0)、轻度、中度和重度。为6名放射科医生确定每个CXR的中位严重程度评分(Rad Med)，用于开发多阅读器数据库(XCOMS)。计算肯德尔Tau相关性和不一致百分比以评估变异性。结果共纳入397例患者，其中cxr 1208例，平均年龄60岁(SD±1)，男性189例。放射科医师之间的观察者间差异在0.67-0.78之间。与Rad Med评分相比，放射科医生在0.79-0.88之间表现出良好的相关性。住院医生与经验丰富的放射科医生之间的一致性为0.66，与经验丰富的放射科医生之间的一致性为0.69-0.71。观察者内部一致性很高，相关系数为0.77。在220例(18%)、707例(59%)、259例(21%)和22例(2%)的cxr中存在0、1、2或3类差异。594例(50%)住院医师和放射科医师的中位评分相似，578例(48%)和36例(3%)的中位评分存在1级和2级差异。结论经验丰富的放射科医师和在职放射科医师在COVID-19肺炎严重程度分级上表现出良好的观察者之间和观察者内部的一致性。在中等情况下，观察到更高比例的不一致，这可能会影响人工智能算法的训练。大多数人工智能算法都是在单个专家标记的数据上训练的。本研究表明，对于COVID-19 x射线严重程度的分类，放射科医生和居民之间存在显著的差异和分歧。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The British journal of radiology

自引率

0.00%

发文量