Automatic labels are as effective as manual labels in digital pathology images classification with deep learning

Q2 Medicine
Niccolo Marini , Stefano Marchesin , Lluis Borras Ferris , Simon Püttmann , Marek Wodzinski , Riccardo Fratti , Damian Podareanu , Alessandro Caputo , Svetla Boytcheva , Simona Vatrano , Filippo Fraggetta , Iris Nagtegaal , Gianmaria Silvello , Manfredo Atzori , Henning Müller
{"title":"Automatic labels are as effective as manual labels in digital pathology images classification with deep learning","authors":"Niccolo Marini ,&nbsp;Stefano Marchesin ,&nbsp;Lluis Borras Ferris ,&nbsp;Simon Püttmann ,&nbsp;Marek Wodzinski ,&nbsp;Riccardo Fratti ,&nbsp;Damian Podareanu ,&nbsp;Alessandro Caputo ,&nbsp;Svetla Boytcheva ,&nbsp;Simona Vatrano ,&nbsp;Filippo Fraggetta ,&nbsp;Iris Nagtegaal ,&nbsp;Gianmaria Silvello ,&nbsp;Manfredo Atzori ,&nbsp;Henning Müller","doi":"10.1016/j.jpi.2025.100462","DOIUrl":null,"url":null,"abstract":"<div><div>The increasing availability of biomedical data is helping to design more robust deep learning (DL) algorithms to analyze biomedical samples. Currently, one of the main limitations to training DL algorithms to perform a specific task is the need for medical experts to manually label the data. Automatic methods to label data exist; however, automatic labels can be noisy, and it is not completely clear in which situations they can be used to train DL models.</div><div>This paper aims to investigate under which circumstances automatic labels can be used to train a DL model for the classification of whole slide images. The analysis involves multiple architectures, such as convolutional neural networks and vision transformer, and 10,604 WSIs as training data, collected from three use cases: celiac disease, lung cancer, and colon cancer, which include respectively binary, multiclass, and multilabel data. The results identify 10% as the percentage of noisy labels before a performance drop-off, so to train effective models for the classification of WSIs, reaching, respectively, F1-scores of 0.906, 0.757, and 0.833. Therefore, an algorithm generating automatic labels needs to stay within this range to be adopted, as shown by the application of Semantic Knowledge Extractor Tool as a tool to automatically extract concepts and use them as labels. Automatic labels are as effective as manual labels in this case, achieving solid performance comparable to that obtained by training models with manual labels.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"18 ","pages":"Article 100462"},"PeriodicalIF":0.0000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pathology Informatics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2153353925000483","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

The increasing availability of biomedical data is helping to design more robust deep learning (DL) algorithms to analyze biomedical samples. Currently, one of the main limitations to training DL algorithms to perform a specific task is the need for medical experts to manually label the data. Automatic methods to label data exist; however, automatic labels can be noisy, and it is not completely clear in which situations they can be used to train DL models.
This paper aims to investigate under which circumstances automatic labels can be used to train a DL model for the classification of whole slide images. The analysis involves multiple architectures, such as convolutional neural networks and vision transformer, and 10,604 WSIs as training data, collected from three use cases: celiac disease, lung cancer, and colon cancer, which include respectively binary, multiclass, and multilabel data. The results identify 10% as the percentage of noisy labels before a performance drop-off, so to train effective models for the classification of WSIs, reaching, respectively, F1-scores of 0.906, 0.757, and 0.833. Therefore, an algorithm generating automatic labels needs to stay within this range to be adopted, as shown by the application of Semantic Knowledge Extractor Tool as a tool to automatically extract concepts and use them as labels. Automatic labels are as effective as manual labels in this case, achieving solid performance comparable to that obtained by training models with manual labels.
在深度学习的数字病理图像分类中,自动标记与人工标记一样有效
越来越多的生物医学数据可用性有助于设计更强大的深度学习(DL)算法来分析生物医学样本。目前,训练DL算法执行特定任务的主要限制之一是需要医学专家手动标记数据。存在标记数据的自动方法;然而,自动标签可能是有噪声的,并且在哪些情况下它们可以用于训练DL模型还不完全清楚。本文旨在研究在何种情况下,自动标签可以用于训练一个用于整个幻灯片图像分类的深度学习模型。该分析涉及多个架构,如卷积神经网络和视觉转换器,以及10604个wsi作为训练数据,收集自三个用例:乳糜泻、肺癌和结肠癌,分别包括二进制、多类和多标签数据。结果表明,在性能下降之前,噪声标签的百分比为10%,因此为了训练有效的wsi分类模型,f1得分分别达到0.906、0.757和0.833。因此,自动生成标签的算法需要保持在这个范围内,如使用Semantic Knowledge Extractor Tool作为自动提取概念并将其作为标签的工具。在这种情况下,自动标签与手动标签一样有效,可以获得与使用手动标签训练模型相当的可靠性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Pathology Informatics
Journal of Pathology Informatics Medicine-Pathology and Forensic Medicine
CiteScore
3.70
自引率
0.00%
发文量
2
审稿时长
18 weeks
期刊介绍: The Journal of Pathology Informatics (JPI) is an open access peer-reviewed journal dedicated to the advancement of pathology informatics. This is the official journal of the Association for Pathology Informatics (API). The journal aims to publish broadly about pathology informatics and freely disseminate all articles worldwide. This journal is of interest to pathologists, informaticians, academics, researchers, health IT specialists, information officers, IT staff, vendors, and anyone with an interest in informatics. We encourage submissions from anyone with an interest in the field of pathology informatics. We publish all types of papers related to pathology informatics including original research articles, technical notes, reviews, viewpoints, commentaries, editorials, symposia, meeting abstracts, book reviews, and correspondence to the editors. All submissions are subject to rigorous peer review by the well-regarded editorial board and by expert referees in appropriate specialties.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信