Efficient labelling for efficient deep learning: the benefit of a multiple-image-ranking method to generate high volume training data applied to ventricular slice level classification in cardiac MRI.

Journal of medical artificial intelligence Pub Date : 2023-04-01 DOI:10.21037/jmai-22-55

Sameer Zaman, Kavitha Vimalesvaran, James P Howard, Digby Chappell, Marta Varela, Nicholas S Peters, Darrel P Francis, Anil A Bharath, Nick W F Linton, Graham D Cole

{"title":"Efficient labelling for efficient deep learning: the benefit of a multiple-image-ranking method to generate high volume training data applied to ventricular slice level classification in cardiac MRI.","authors":"Sameer Zaman, Kavitha Vimalesvaran, James P Howard, Digby Chappell, Marta Varela, Nicholas S Peters, Darrel P Francis, Anil A Bharath, Nick W F Linton, Graham D Cole","doi":"10.21037/jmai-22-55","DOIUrl":null,"url":null,"abstract":"Background: Getting the most value from expert clinicians' limited labelling time is a major challenge for artificial intelligence (AI) development in clinical imaging. We present a novel method for ground-truth labelling of cardiac magnetic resonance imaging (CMR) image data by leveraging multiple clinician experts ranking multiple images on a single ordinal axis, rather than manual labelling of one image at a time. We apply this strategy to train a deep learning (DL) model to classify the anatomical position of CMR images. This allows the automated removal of slices that do not contain the left ventricular (LV) myocardium.Methods: Anonymised LV short-axis slices from 300 random scans (3,552 individual images) were extracted. Each image's anatomical position relative to the LV was labelled using two different strategies performed for 5 hours each: (I) 'one-image-at-a-time': each image labelled according to its position: 'too basal', 'LV', or 'too apical' individually by one of three experts; and (II) 'multiple-image-ranking': three independent experts ordered slices according to their relative position from 'most-basal' to 'most apical' in batches of eight until each image had been viewed at least 3 times. Two convolutional neural networks were trained for a three-way classification task (each model using data from one labelling strategy). The models' performance was evaluated by accuracy, F1-score, and area under the receiver operating characteristics curve (ROC AUC).Results: After excluding images with artefact, 3,323 images were labelled by both strategies. The model trained using labels from the 'multiple-image-ranking strategy' performed better than the model using the 'one-image-at-a-time' labelling strategy (accuracy 86% vs. 72%, P=0.02; F1-score 0.86 vs. 0.75; ROC AUC 0.95 vs. 0.86). For expert clinicians performing this task manually the intra-observer variability was low (Cohen's κ=0.90), but the inter-observer variability was higher (Cohen's κ=0.77).Conclusions: We present proof of concept that, given the same clinician labelling effort, comparing multiple images side-by-side using a 'multiple-image-ranking' strategy achieves ground truth labels for DL more accurately than by classifying images individually. We demonstrate a potential clinical application: the automatic removal of unrequired CMR images. This leads to increased efficiency by focussing human and machine attention on images which are needed to answer clinical questions.","PeriodicalId":73815,"journal":{"name":"Journal of medical artificial intelligence","volume":"6 ","pages":"4"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7614685/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of medical artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21037/jmai-22-55","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Getting the most value from expert clinicians' limited labelling time is a major challenge for artificial intelligence (AI) development in clinical imaging. We present a novel method for ground-truth labelling of cardiac magnetic resonance imaging (CMR) image data by leveraging multiple clinician experts ranking multiple images on a single ordinal axis, rather than manual labelling of one image at a time. We apply this strategy to train a deep learning (DL) model to classify the anatomical position of CMR images. This allows the automated removal of slices that do not contain the left ventricular (LV) myocardium.

Methods: Anonymised LV short-axis slices from 300 random scans (3,552 individual images) were extracted. Each image's anatomical position relative to the LV was labelled using two different strategies performed for 5 hours each: (I) 'one-image-at-a-time': each image labelled according to its position: 'too basal', 'LV', or 'too apical' individually by one of three experts; and (II) 'multiple-image-ranking': three independent experts ordered slices according to their relative position from 'most-basal' to 'most apical' in batches of eight until each image had been viewed at least 3 times. Two convolutional neural networks were trained for a three-way classification task (each model using data from one labelling strategy). The models' performance was evaluated by accuracy, F1-score, and area under the receiver operating characteristics curve (ROC AUC).

Results: After excluding images with artefact, 3,323 images were labelled by both strategies. The model trained using labels from the 'multiple-image-ranking strategy' performed better than the model using the 'one-image-at-a-time' labelling strategy (accuracy 86% vs. 72%, P=0.02; F1-score 0.86 vs. 0.75; ROC AUC 0.95 vs. 0.86). For expert clinicians performing this task manually the intra-observer variability was low (Cohen's κ=0.90), but the inter-observer variability was higher (Cohen's κ=0.77).

Conclusions: We present proof of concept that, given the same clinician labelling effort, comparing multiple images side-by-side using a 'multiple-image-ranking' strategy achieves ground truth labels for DL more accurately than by classifying images individually. We demonstrate a potential clinical application: the automatic removal of unrequired CMR images. This leads to increased efficiency by focussing human and machine attention on images which are needed to answer clinical questions.

Abstract Image

查看原文本刊更多论文

高效深度学习的高效标记：将多图像排序法应用于心脏磁共振成像的心室切片水平分类，生成大量训练数据的好处。

背景：如何从临床专家有限的标注时间中获取最大价值，是临床成像领域人工智能（AI）发展面临的一大挑战。我们提出了一种对心脏磁共振成像（CMR）图像数据进行地面实况标注的新方法，即利用多名临床专家在单个序轴上对多幅图像进行排序，而不是每次对一幅图像进行人工标注。我们采用这种策略训练深度学习（DL）模型，对 CMR 图像的解剖位置进行分类。这样就能自动去除不包含左心室（LV）心肌的切片：方法：从 300 张随机扫描图像（3,552 张独立图像）中提取匿名左心室短轴切片。每张图像相对于左心室的解剖位置采用两种不同的策略进行标注，每种策略持续5小时：(I) "一次标注一张图像"：三位专家中的一位根据每张图像的位置分别标注 "太基底"、"左心室 "或 "太心尖"；(II) "多张图像排序"：三位独立专家根据切片的相对位置从 "最基底 "到 "最心尖 "进行排序，每8张切片为一批，直到每张图像被查看至少3次。对两个卷积神经网络进行了三向分类任务训练（每个模型使用一种标记策略的数据）。通过准确率、F1-分数和接收者操作特征曲线下面积（ROC AUC）对模型的性能进行评估：结果：在排除了有伪影的图像后，有 3323 张图像被两种策略标记。使用 "多张图像排序策略 "标签训练的模型比使用 "一次一张图像 "标签策略训练的模型表现更好（准确率为 86% 对 72%，P=0.02；F1 分数为 0.86 对 0.75；ROC AUC 为 0.95 对 0.86）。对于手动执行这项任务的临床专家而言，观察者内部的变异性较低（Cohen's κ=0.90），但观察者之间的变异性较高（Cohen's κ=0.77）：我们提出的概念证明，在临床医生进行相同标记的情况下，使用 "多张图像排序 "策略并排比较多张图像，比单独对图像进行分类更能准确地获得 DL 的基本真实标签。我们展示了一种潜在的临床应用：自动移除不需要的 CMR 图像。这可以将人和机器的注意力集中在回答临床问题所需的图像上，从而提高效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of medical artificial intelligence

CiteScore

2.30

自引率

0.00%

发文量