Enhancing basal cell carcinoma classification in preoperative biopsies via transfer learning with weakly supervised graph transformers.

IF 2.9 3区医学 Q2 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

BMC Medical Imaging Pub Date : 2025-05-16 DOI:10.1186/s12880-025-01710-4

Johan Björkman, Sigrid Lagerroth, Jan Siarov, Filmon Yacob, Noora Neittaanmäki

{"title":"Enhancing basal cell carcinoma classification in preoperative biopsies via transfer learning with weakly supervised graph transformers.","authors":"Johan Björkman, Sigrid Lagerroth, Jan Siarov, Filmon Yacob, Noora Neittaanmäki","doi":"10.1186/s12880-025-01710-4","DOIUrl":null,"url":null,"abstract":"Background: Basal cell carcinoma (BCC) is the most common skin cancer, placing a significant burden on healthcare systems globally. Developing high-precision automated diagnostics requires large annotated datasets, which are costly and difficult to obtain. This study aimed to fine-tune a weakly supervised machine learning model to classify BCC in preoperative punch biopsies using transfer learning. By addressing challenges of scalability and variability, this approach seeks to enhance generalizability and diagnostic accuracy.Methods: The Basal Cell Classification (BCCC) dataset included 514 WSIs of punch biopsies (261 with BCC and 253 tumor-free slides), divided into training (70%), validation (15%), and test sets (15%). WSIs were split into patches, and features were extracted using a pretrained simCLR model trained on 1,435 WSIs from BCC excisions. Features were formed into graphs for spatial information and the processed by a Vision Transformer. Testing included finetuned and non-finetuned pre-trained models as well as a model trained from the scratch, evaluated on 78 WSIs from the BCCC dataset. The COBRA dataset of 3,588 WSIs (1,794 with BCC and 1,794 without) was used for external validation. Models classified no-tumor vs. tumor (two classes), no-tumor vs. low-risk vs. high-risk tumors (three classes), and no-tumor vs. four BCC subtypes (five classes).Results: The fine-tuned model significantly outperformed the non-fine-tuned pretrained model and the model trained from the scratch with accuracies of 91.7%, 82.1%, and 75.3% and with AUCs of 0.98, 0.95-0.98, and 0.91-0.97 for two, three, and five-class classification. On the external validation, accuracies were 84.9% and 70.5%, with AUCs of 0.92 and 0.89-0.91 for two and three-class classification, respectively. The ablation study revealed that the fine-tuned model outperformed the model trained from scratch, improving mean accuracy by 10.6%, 11.7%, and 13.1% on the BCCC dataset, as well as by 29.6% and 19.2% on the COBRA dataset.Conclusions: The results suggest that transfer learning not only enhances model performance on small datasets but also supports robust feature extraction in complex histopathology tasks. These findings reinforce the utility of pre-trained models in computational pathology, where access to large, labeled datasets is often limited, and task-specific challenges require nuanced understanding of the visual data.","PeriodicalId":9020,"journal":{"name":"BMC Medical Imaging","volume":"25 1","pages":"166"},"PeriodicalIF":2.9000,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12084905/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12880-025-01710-4","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Basal cell carcinoma (BCC) is the most common skin cancer, placing a significant burden on healthcare systems globally. Developing high-precision automated diagnostics requires large annotated datasets, which are costly and difficult to obtain. This study aimed to fine-tune a weakly supervised machine learning model to classify BCC in preoperative punch biopsies using transfer learning. By addressing challenges of scalability and variability, this approach seeks to enhance generalizability and diagnostic accuracy.

Methods: The Basal Cell Classification (BCCC) dataset included 514 WSIs of punch biopsies (261 with BCC and 253 tumor-free slides), divided into training (70%), validation (15%), and test sets (15%). WSIs were split into patches, and features were extracted using a pretrained simCLR model trained on 1,435 WSIs from BCC excisions. Features were formed into graphs for spatial information and the processed by a Vision Transformer. Testing included finetuned and non-finetuned pre-trained models as well as a model trained from the scratch, evaluated on 78 WSIs from the BCCC dataset. The COBRA dataset of 3,588 WSIs (1,794 with BCC and 1,794 without) was used for external validation. Models classified no-tumor vs. tumor (two classes), no-tumor vs. low-risk vs. high-risk tumors (three classes), and no-tumor vs. four BCC subtypes (five classes).

Results: The fine-tuned model significantly outperformed the non-fine-tuned pretrained model and the model trained from the scratch with accuracies of 91.7%, 82.1%, and 75.3% and with AUCs of 0.98, 0.95-0.98, and 0.91-0.97 for two, three, and five-class classification. On the external validation, accuracies were 84.9% and 70.5%, with AUCs of 0.92 and 0.89-0.91 for two and three-class classification, respectively. The ablation study revealed that the fine-tuned model outperformed the model trained from scratch, improving mean accuracy by 10.6%, 11.7%, and 13.1% on the BCCC dataset, as well as by 29.6% and 19.2% on the COBRA dataset.

Conclusions: The results suggest that transfer learning not only enhances model performance on small datasets but also supports robust feature extraction in complex histopathology tasks. These findings reinforce the utility of pre-trained models in computational pathology, where access to large, labeled datasets is often limited, and task-specific challenges require nuanced understanding of the visual data.

查看原文本刊更多论文

基于弱监督图变换的迁移学习增强术前活检中基底细胞癌的分类。

背景：基底细胞癌（BCC）是最常见的皮肤癌，对全球卫生保健系统造成了重大负担。开发高精度的自动诊断需要大量的带注释的数据集，这些数据集既昂贵又难以获得。本研究旨在使用迁移学习对弱监督机器学习模型进行微调，以对术前穿刺活检中的BCC进行分类。通过解决可扩展性和可变性的挑战，这种方法寻求提高通用性和诊断准确性。方法：基底细胞分类（BCCC）数据集包括514例穿孔活检wsi（261例BCC和253例无肿瘤载玻片），分为训练集（70%）、验证集（15%）和测试集（15%）。将wsi分割成小块，并使用预训练的simCLR模型提取特征，该模型对来自BCC切除的1435个wsi进行了训练。特征以图形形式表示空间信息，并通过视觉转换器进行处理。测试包括微调和非微调预训练模型，以及从头开始训练的模型，在BCCC数据集的78个wsi上进行评估。COBRA数据集包含3588个wsi（1794个带有BCC， 1794个没有）用于外部验证。模型将无肿瘤与肿瘤（2类）、无肿瘤与低风险与高风险肿瘤（3类）、无肿瘤与4种BCC亚型（5类）进行了分类。结果：精调模型的准确率分别为91.7%、82.1%、75.3%，显著优于非精调预训练模型和从头训练模型，二类、三类、五类分类的auc分别为0.98、0.95 ~ 0.98、0.91 ~ 0.97。外部验证的准确率分别为84.9%和70.5%，二级和三级分类的auc分别为0.92和0.89-0.91。消融研究表明，微调模型优于从头开始训练的模型，在BCCC数据集上分别提高了10.6%、11.7%和13.1%的平均准确率，在COBRA数据集上分别提高了29.6%和19.2%。结论：研究结果表明，迁移学习不仅可以提高模型在小数据集上的性能，还可以支持复杂组织病理学任务的鲁棒特征提取。这些发现加强了预训练模型在计算病理学中的实用性，在计算病理学中，访问大型标记数据集通常是有限的，并且特定任务的挑战需要对视觉数据进行细致的理解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Medical Imaging RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-

CiteScore

4.60

自引率

3.70%

发文量

198

审稿时长

27 weeks

期刊介绍： BMC Medical Imaging is an open access journal publishing original peer-reviewed research articles in the development, evaluation, and use of imaging techniques and image processing tools to diagnose and manage disease.