Soumyajit Ray, Cheng-Yu Lee, Hyeon-Cheol Park, David W Nauen, Chetan Bettegowda, Xingde Li, Rama Chellappa
{"title":"Leveraging pretrained vision transformers for automated cancer diagnosis in optical coherence tomography images.","authors":"Soumyajit Ray, Cheng-Yu Lee, Hyeon-Cheol Park, David W Nauen, Chetan Bettegowda, Xingde Li, Rama Chellappa","doi":"10.1364/BOE.563694","DOIUrl":null,"url":null,"abstract":"<p><p>This study presents an approach to brain cancer detection based on optical coherence tomography (OCT) images and advanced machine learning techniques. The research addresses the critical need for accurate, real-time differentiation between cancerous and noncancerous brain tissue during neurosurgical procedures. The proposed method combines a pre-trained large vision transformer (ViT) model, specifically DINOv2, with a convolutional neural network (CNN) operating on the grey level co-occurrence matrix (GLCM) texture features. This dual-path architecture leverages both the global contextual feature extraction capabilities of transformers and the local texture analysis strengths of GLCM + CNNs. To mitigate patient-specific bias from the limited cohort, we incorporate an adversarial discriminator network that attempts to identify individual patients from feature representations, creating a competing objective that forces the model to learn generalizable cancer-indicative features rather than patient-specific characteristics. We also explore an alternative state space model approach using MambaVision blocks, which achieves comparable performance. The dataset comprised OCT images from 11 patients, with 5,831 B-frame slices from 7 patients used for training and validation, and 1,610 slices from 4 patients used for testing. The model achieved high accuracy in distinguishing cancerous from noncancerous tissue, with over 99% accuracy on the training dataset, 98.8% on the validation dataset and 98.6% accuracy on the test dataset. This approach demonstrates significant potential for achieving and improving intraoperative decision-making in brain cancer surgeries, offering real-time, high-accuracy tissue classification and surgical guidance.</p>","PeriodicalId":8969,"journal":{"name":"Biomedical optics express","volume":"16 8","pages":"3283-3294"},"PeriodicalIF":3.2000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12339304/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical optics express","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1364/BOE.563694","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
This study presents an approach to brain cancer detection based on optical coherence tomography (OCT) images and advanced machine learning techniques. The research addresses the critical need for accurate, real-time differentiation between cancerous and noncancerous brain tissue during neurosurgical procedures. The proposed method combines a pre-trained large vision transformer (ViT) model, specifically DINOv2, with a convolutional neural network (CNN) operating on the grey level co-occurrence matrix (GLCM) texture features. This dual-path architecture leverages both the global contextual feature extraction capabilities of transformers and the local texture analysis strengths of GLCM + CNNs. To mitigate patient-specific bias from the limited cohort, we incorporate an adversarial discriminator network that attempts to identify individual patients from feature representations, creating a competing objective that forces the model to learn generalizable cancer-indicative features rather than patient-specific characteristics. We also explore an alternative state space model approach using MambaVision blocks, which achieves comparable performance. The dataset comprised OCT images from 11 patients, with 5,831 B-frame slices from 7 patients used for training and validation, and 1,610 slices from 4 patients used for testing. The model achieved high accuracy in distinguishing cancerous from noncancerous tissue, with over 99% accuracy on the training dataset, 98.8% on the validation dataset and 98.6% accuracy on the test dataset. This approach demonstrates significant potential for achieving and improving intraoperative decision-making in brain cancer surgeries, offering real-time, high-accuracy tissue classification and surgical guidance.
期刊介绍:
The journal''s scope encompasses fundamental research, technology development, biomedical studies and clinical applications. BOEx focuses on the leading edge topics in the field, including:
Tissue optics and spectroscopy
Novel microscopies
Optical coherence tomography
Diffuse and fluorescence tomography
Photoacoustic and multimodal imaging
Molecular imaging and therapies
Nanophotonic biosensing
Optical biophysics/photobiology
Microfluidic optical devices
Vision research.