Combating Medical Label Noise through more precise partition-correction and progressive hard-enhanced learning

IF 4.9 2区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computer methods and programs in biomedicine Pub Date : 2025-03-29 DOI:10.1016/j.cmpb.2025.108734

Sanyan Zhang , Surong Chu , Yan Qiang , Juanjuan Zhao , Yan Wang , Xiao Wei

{"title":"Combating Medical Label Noise through more precise partition-correction and progressive hard-enhanced learning","authors":"Sanyan Zhang , Surong Chu , Yan Qiang , Juanjuan Zhao , Yan Wang , Xiao Wei","doi":"10.1016/j.cmpb.2025.108734","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objective:</h3><div>Computer-aided diagnosis systems based on deep neural networks heavily rely on datasets with high-quality labels. However, manual annotation for lesion diagnosis relies on image features, often requiring professional experience and complex image analysis process. This inevitably introduces noisy labels, which can misguide the training of classification models. Our goal is to design an effective method to address the challenges posed by label noise in medical images.</div></div><div><h3>Methods:</h3><div>we propose a novel noise-tolerant medical image classification framework consisting of two phases: fore-training correction and progressive hard-sample enhanced learning. In the first phase, we design a dual-branch sample partition detection scheme that effectively classifies each instance into one of three subsets: clean, hard, or noisy. Simultaneously, we propose a hard-sample label refinement strategy based on class prototypes with confidence-perception weighting and an effective joint correction method for noisy samples, enabling the acquisition of higher-quality training data. In the second phase, we design a progressive hard-sample reinforcement learning method to enhance the model’s ability to learn discriminative feature representations. This approach accounts for sample difficulty and mitigates the effects of label noise in medical datasets.</div></div><div><h3>Results:</h3><div>Our framework achieves an accuracy of 82.39% on the pneumoconiosis dataset collected by our laboratory. On a five-class skin disease dataset with six different levels of label noise (0, 0.05, 0.1, 0.2, 0.3, and 0.4), the average accuracy over the last ten epochs reaches 88.51%, 86.64%, 85.02%, 83.01%, 81.95%, 77.89%, respectively; For binary polyp classification under noise rates of 0.2, 0.3, and 0.4, the average accuracy over the last ten epochs is 97.90%, 93.77%, 89.33%, respectively.</div></div><div><h3>Conclusions:</h3><div>The effectiveness of our proposed framework is demonstrated through its performance on three challenging datasets with both real and synthetic noise. Experimental results further demonstrate the robustness of our method across varying noise rates.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"265 ","pages":"Article 108734"},"PeriodicalIF":4.9000,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725001518","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Background and Objective:

Computer-aided diagnosis systems based on deep neural networks heavily rely on datasets with high-quality labels. However, manual annotation for lesion diagnosis relies on image features, often requiring professional experience and complex image analysis process. This inevitably introduces noisy labels, which can misguide the training of classification models. Our goal is to design an effective method to address the challenges posed by label noise in medical images.

Methods:

we propose a novel noise-tolerant medical image classification framework consisting of two phases: fore-training correction and progressive hard-sample enhanced learning. In the first phase, we design a dual-branch sample partition detection scheme that effectively classifies each instance into one of three subsets: clean, hard, or noisy. Simultaneously, we propose a hard-sample label refinement strategy based on class prototypes with confidence-perception weighting and an effective joint correction method for noisy samples, enabling the acquisition of higher-quality training data. In the second phase, we design a progressive hard-sample reinforcement learning method to enhance the model’s ability to learn discriminative feature representations. This approach accounts for sample difficulty and mitigates the effects of label noise in medical datasets.

Results:

Our framework achieves an accuracy of 82.39% on the pneumoconiosis dataset collected by our laboratory. On a five-class skin disease dataset with six different levels of label noise (0, 0.05, 0.1, 0.2, 0.3, and 0.4), the average accuracy over the last ten epochs reaches 88.51%, 86.64%, 85.02%, 83.01%, 81.95%, 77.89%, respectively; For binary polyp classification under noise rates of 0.2, 0.3, and 0.4, the average accuracy over the last ten epochs is 97.90%, 93.77%, 89.33%, respectively.

Conclusions:

The effectiveness of our proposed framework is demonstrated through its performance on three challenging datasets with both real and synthetic noise. Experimental results further demonstrate the robustness of our method across varying noise rates.

查看原文本刊更多论文

通过更精确的分割校正和渐进的强化学习来对抗医疗标签噪音

背景与目的：基于深度神经网络的计算机辅助诊断系统严重依赖具有高质量标签的数据集。然而，手工标注病变诊断依赖于图像特征，往往需要专业经验和复杂的图像分析过程。这不可避免地引入了噪声标签，这可能会误导分类模型的训练。我们的目标是设计一种有效的方法来解决医学图像中标签噪声带来的挑战。方法：提出了一种新的耐噪医学图像分类框架，包括两个阶段：训练前校正和渐进式硬样本增强学习。在第一阶段，我们设计了一个双分支样本分区检测方案，该方案有效地将每个实例分类为三个子集之一：干净、硬或有噪声。同时，我们提出了一种基于类原型的带有置信度感知权重的硬样本标签细化策略和一种有效的噪声样本联合校正方法，从而获得更高质量的训练数据。在第二阶段，我们设计了一种渐进式硬样本强化学习方法来增强模型学习判别特征表示的能力。这种方法考虑了样本难度，减轻了医疗数据集中标签噪声的影响。结果：我们的框架在实验室收集的尘肺数据集上实现了82.39%的准确率。在具有6种不同标签噪声水平（0、0.05、0.1、0.2、0.3、0.4）的5类皮肤病数据集上，近10个epoch的平均准确率分别达到88.51%、86.64%、85.02%、83.01%、81.95%、77.89%；对于噪声率为0.2、0.3和0.4的二值息肉分类，近10个epoch的平均准确率分别为97.90%、93.77%和89.33%。结论：我们提出的框架的有效性通过其在具有真实和合成噪声的三个具有挑战性的数据集上的性能来证明。实验结果进一步证明了该方法在不同噪声率下的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer methods and programs in biomedicine 工程技术-工程：生物医学

CiteScore

12.30

自引率

6.60%

发文量

601

审稿时长

135 days

期刊介绍： To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine. Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.