Leveraging pretrained vision transformers for automated cancer diagnosis in optical coherence tomography images.

IF 3.2 2区医学 Q2 BIOCHEMICAL RESEARCH METHODS

Biomedical optics express Pub Date : 2025-07-21 eCollection Date: 2025-08-01 DOI:10.1364/BOE.563694

Soumyajit Ray, Cheng-Yu Lee, Hyeon-Cheol Park, David W Nauen, Chetan Bettegowda, Xingde Li, Rama Chellappa

{"title":"Leveraging pretrained vision transformers for automated cancer diagnosis in optical coherence tomography images.","authors":"Soumyajit Ray, Cheng-Yu Lee, Hyeon-Cheol Park, David W Nauen, Chetan Bettegowda, Xingde Li, Rama Chellappa","doi":"10.1364/BOE.563694","DOIUrl":null,"url":null,"abstract":"<p><p>This study presents an approach to brain cancer detection based on optical coherence tomography (OCT) images and advanced machine learning techniques. The research addresses the critical need for accurate, real-time differentiation between cancerous and noncancerous brain tissue during neurosurgical procedures. The proposed method combines a pre-trained large vision transformer (ViT) model, specifically DINOv2, with a convolutional neural network (CNN) operating on the grey level co-occurrence matrix (GLCM) texture features. This dual-path architecture leverages both the global contextual feature extraction capabilities of transformers and the local texture analysis strengths of GLCM + CNNs. To mitigate patient-specific bias from the limited cohort, we incorporate an adversarial discriminator network that attempts to identify individual patients from feature representations, creating a competing objective that forces the model to learn generalizable cancer-indicative features rather than patient-specific characteristics. We also explore an alternative state space model approach using MambaVision blocks, which achieves comparable performance. The dataset comprised OCT images from 11 patients, with 5,831 B-frame slices from 7 patients used for training and validation, and 1,610 slices from 4 patients used for testing. The model achieved high accuracy in distinguishing cancerous from noncancerous tissue, with over 99% accuracy on the training dataset, 98.8% on the validation dataset and 98.6% accuracy on the test dataset. This approach demonstrates significant potential for achieving and improving intraoperative decision-making in brain cancer surgeries, offering real-time, high-accuracy tissue classification and surgical guidance.</p>","PeriodicalId":8969,"journal":{"name":"Biomedical optics express","volume":"16 8","pages":"3283-3294"},"PeriodicalIF":3.2000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12339304/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical optics express","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1364/BOE.563694","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

This study presents an approach to brain cancer detection based on optical coherence tomography (OCT) images and advanced machine learning techniques. The research addresses the critical need for accurate, real-time differentiation between cancerous and noncancerous brain tissue during neurosurgical procedures. The proposed method combines a pre-trained large vision transformer (ViT) model, specifically DINOv2, with a convolutional neural network (CNN) operating on the grey level co-occurrence matrix (GLCM) texture features. This dual-path architecture leverages both the global contextual feature extraction capabilities of transformers and the local texture analysis strengths of GLCM + CNNs. To mitigate patient-specific bias from the limited cohort, we incorporate an adversarial discriminator network that attempts to identify individual patients from feature representations, creating a competing objective that forces the model to learn generalizable cancer-indicative features rather than patient-specific characteristics. We also explore an alternative state space model approach using MambaVision blocks, which achieves comparable performance. The dataset comprised OCT images from 11 patients, with 5,831 B-frame slices from 7 patients used for training and validation, and 1,610 slices from 4 patients used for testing. The model achieved high accuracy in distinguishing cancerous from noncancerous tissue, with over 99% accuracy on the training dataset, 98.8% on the validation dataset and 98.6% accuracy on the test dataset. This approach demonstrates significant potential for achieving and improving intraoperative decision-making in brain cancer surgeries, offering real-time, high-accuracy tissue classification and surgical guidance.

查看原文本刊更多论文

利用预训练视觉变压器在光学相干断层扫描图像中进行自动癌症诊断。

本研究提出了一种基于光学相干断层扫描（OCT）图像和先进机器学习技术的脑癌检测方法。该研究解决了在神经外科手术过程中准确、实时区分癌变和非癌变脑组织的关键需求。该方法将预先训练好的大视觉变形（ViT）模型（具体为DINOv2）与基于灰度共现矩阵（GLCM）纹理特征的卷积神经网络（CNN）相结合。这种双路径结构利用了变压器的全局上下文特征提取能力和GLCM + cnn的局部纹理分析能力。为了减轻来自有限队列的患者特异性偏差，我们结合了一个对抗性鉴别器网络，试图从特征表征中识别个体患者，创建了一个竞争目标，迫使模型学习可推广的癌症指示性特征，而不是患者特异性特征。我们还探索了使用MambaVision块的另一种状态空间模型方法，该方法实现了类似的性能。该数据集包括来自11名患者的OCT图像，其中来自7名患者的5831张b帧切片用于训练和验证，来自4名患者的1610张切片用于测试。该模型在区分癌组织和非癌组织方面取得了很高的准确率，在训练数据集上的准确率超过99%，在验证数据集上的准确率为98.8%，在测试数据集上的准确率为98.6%。该方法为实现和改善脑癌手术中的术中决策提供了巨大的潜力，提供了实时、高精度的组织分类和手术指导。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biomedical optics express BIOCHEMICAL RESEARCH METHODS-OPTICS

CiteScore

6.80

自引率

11.80%

发文量

633

审稿时长

1 months

期刊介绍： The journal''s scope encompasses fundamental research, technology development, biomedical studies and clinical applications. BOEx focuses on the leading edge topics in the field, including: Tissue optics and spectroscopy Novel microscopies Optical coherence tomography Diffuse and fluorescence tomography Photoacoustic and multimodal imaging Molecular imaging and therapies Nanophotonic biosensing Optical biophysics/photobiology Microfluidic optical devices Vision research.