Incorporating label uncertainty during the training of convolutional neural networks improves performance for the discrimination between certain and inconclusive cases in dopamine transporter SPECT

IF 8.6 1区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

European Journal of Nuclear Medicine and Molecular Imaging Pub Date : 2024-11-27 DOI:10.1007/s00259-024-06988-0

Aleksej Kucerenko, Thomas Buddenkotte, Ivayla Apostolova, Susanne Klutmann, Christian Ledig, Ralph Buchert

{"title":"Incorporating label uncertainty during the training of convolutional neural networks improves performance for the discrimination between certain and inconclusive cases in dopamine transporter SPECT","authors":"Aleksej Kucerenko, Thomas Buddenkotte, Ivayla Apostolova, Susanne Klutmann, Christian Ledig, Ralph Buchert","doi":"10.1007/s00259-024-06988-0","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Purpose</h3><p>Deep convolutional neural networks (CNN) hold promise for assisting the interpretation of dopamine transporter (DAT)-SPECT. For improved communication of uncertainty to the user it is crucial to reliably discriminate certain from inconclusive cases that might be misclassified by strict application of a predefined decision threshold on the CNN output. This study tested two methods to incorporate existing label uncertainty during the training to improve the utility of the CNN sigmoid output for this task.</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>Three datasets were used retrospectively: a “development” dataset (<i>n</i> = 1740) for CNN training, validation and testing, two independent out-of-distribution datasets (<i>n</i> = 640, 645) for testing only. In the development dataset, binary classification based on visual inspection was performed carefully by three well-trained readers. A ResNet-18 architecture was trained for binary classification of DAT-SPECT using either a randomly selected vote (“random vote training”, RVT), the proportion of “reduced” votes ( “average vote training”, AVT) or the majority vote (MVT) across the three readers as reference standard. Balanced accuracy was computed separately for “inconclusive” sigmoid outputs (within a predefined interval around the 0.5 decision threshold) and for “certain” (non-inconclusive) sigmoid outputs.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>The proportion of “inconclusive” test cases that had to be accepted to achieve a given balanced accuracy in the “certain” test case was lower with RVT and AVT than with MVT in all datasets (e.g., 1.9% and 1.2% versus 2.8% for 98% balanced accuracy in “certain” test cases from the development dataset). In addition, RVT and AVT resulted in slightly higher balanced accuracy in all test cases independent of their certainty (97.3% and 97.5% versus 97.0% in the development dataset).</p><h3 data-test=\"abstract-sub-heading\">Conclusion</h3><p>Making between-readers-discrepancy known to CNN during the training improves the utility of their sigmoid output to discriminate certain from inconclusive cases that might be misclassified by the CNN when the predefined decision threshold is strictly applied. This does not compromise on overall accuracy.</p>","PeriodicalId":11909,"journal":{"name":"European Journal of Nuclear Medicine and Molecular Imaging","volume":"182 1","pages":""},"PeriodicalIF":8.6000,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Nuclear Medicine and Molecular Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00259-024-06988-0","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

Deep convolutional neural networks (CNN) hold promise for assisting the interpretation of dopamine transporter (DAT)-SPECT. For improved communication of uncertainty to the user it is crucial to reliably discriminate certain from inconclusive cases that might be misclassified by strict application of a predefined decision threshold on the CNN output. This study tested two methods to incorporate existing label uncertainty during the training to improve the utility of the CNN sigmoid output for this task.

Methods

Three datasets were used retrospectively: a “development” dataset (n = 1740) for CNN training, validation and testing, two independent out-of-distribution datasets (n = 640, 645) for testing only. In the development dataset, binary classification based on visual inspection was performed carefully by three well-trained readers. A ResNet-18 architecture was trained for binary classification of DAT-SPECT using either a randomly selected vote (“random vote training”, RVT), the proportion of “reduced” votes ( “average vote training”, AVT) or the majority vote (MVT) across the three readers as reference standard. Balanced accuracy was computed separately for “inconclusive” sigmoid outputs (within a predefined interval around the 0.5 decision threshold) and for “certain” (non-inconclusive) sigmoid outputs.

Results

The proportion of “inconclusive” test cases that had to be accepted to achieve a given balanced accuracy in the “certain” test case was lower with RVT and AVT than with MVT in all datasets (e.g., 1.9% and 1.2% versus 2.8% for 98% balanced accuracy in “certain” test cases from the development dataset). In addition, RVT and AVT resulted in slightly higher balanced accuracy in all test cases independent of their certainty (97.3% and 97.5% versus 97.0% in the development dataset).

Conclusion

Making between-readers-discrepancy known to CNN during the training improves the utility of their sigmoid output to discriminate certain from inconclusive cases that might be misclassified by the CNN when the predefined decision threshold is strictly applied. This does not compromise on overall accuracy.

查看原文本刊更多论文

在卷积神经网络的训练过程中纳入标签的不确定性可提高多巴胺转运体 SPECT 中确定与不确定病例的区分性能

目的深度卷积神经网络（CNN）有望帮助解释多巴胺转运体（DAT）-SPECT。为了更好地向用户传达不确定性，关键是要可靠地区分确定和不确定病例，如果对 CNN 输出严格应用预定义的决策阈值，就可能会对这些病例进行错误分类。本研究测试了两种在训练过程中纳入现有标签不确定性的方法，以提高 CNN sigmoid 输出在此任务中的实用性。方法回顾性地使用了三个数据集：一个用于 CNN 训练、验证和测试的 "开发 "数据集（n = 1740），两个仅用于测试的独立分布外数据集（n = 640、645）。在开发数据集中，由三名训练有素的读者根据目测仔细进行二元分类。使用随机选票（"随机选票训练"，RVT）、"减少 "选票的比例（"平均选票训练"，AVT）或三位读者的多数选票（MVT）作为参考标准，训练 ResNet-18 架构对 DAT-SPECT 进行二元分类。结果在所有数据集中，为了在 "确定 "测试用例中达到给定的平衡准确率，RVT 和 AVT 所接受的 "不确定 "测试用例比例低于 MVT（例如，RVT 为 1.9%，AVT 为 1.2%，而 MVT 为 1.5%）、例如，在开发数据集的 "特定 "测试用例中，如果要达到 98% 的均衡准确率，RVT 和 AVT 的准确率分别为 1.9% 和 1.2%，而 MVT 为 2.8%）。此外，RVT 和 AVT 在所有测试用例中的平衡准确率略高于其确定性（在开发数据集中分别为 97.3% 和 97.5% 对 97.0%）。这不会影响整体准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

European Journal of Nuclear Medicine and Molecular Imaging 医学-核医学

CiteScore

15.60

自引率

9.90%

发文量

392

审稿时长

3 months

期刊介绍： The European Journal of Nuclear Medicine and Molecular Imaging serves as a platform for the exchange of clinical and scientific information within nuclear medicine and related professions. It welcomes international submissions from professionals involved in the functional, metabolic, and molecular investigation of diseases. The journal's coverage spans physics, dosimetry, radiation biology, radiochemistry, and pharmacy, providing high-quality peer review by experts in the field. Known for highly cited and downloaded articles, it ensures global visibility for research work and is part of the EJNMMI journal family.