{"title":"EFCM: Efficient Fine-tuning on Compressed Models for deployment of large models in medical image analysis","authors":"Shaojie Li, Zhaoshuo Diao","doi":"arxiv-2409.11817","DOIUrl":null,"url":null,"abstract":"The recent development of deep learning large models in medicine shows\nremarkable performance in medical image analysis and diagnosis, but their large\nnumber of parameters causes memory and inference latency challenges. Knowledge\ndistillation offers a solution, but the slide-level gradients cannot be\nbackpropagated for student model updates due to high-resolution pathological\nimages and slide-level labels. This study presents an Efficient Fine-tuning on\nCompressed Models (EFCM) framework with two stages: unsupervised feature\ndistillation and fine-tuning. In the distillation stage, Feature Projection\nDistillation (FPD) is proposed with a TransScan module for adaptive receptive\nfield adjustment to enhance the knowledge absorption capability of the student\nmodel. In the slide-level fine-tuning stage, three strategies (Reuse CLAM,\nRetrain CLAM, and End2end Train CLAM (ETC)) are compared. Experiments are\nconducted on 11 downstream datasets related to three large medical models:\nRETFound for retina, MRM for chest X-ray, and BROW for histopathology. The\nexperimental results demonstrate that the EFCM framework significantly improves\naccuracy and efficiency in handling slide-level pathological image problems,\neffectively addressing the challenges of deploying large medical models.\nSpecifically, it achieves a 4.33% increase in ACC and a 5.2% increase in AUC\ncompared to the large model BROW on the TCGA-NSCLC and TCGA-BRCA datasets. The\nanalysis of model inference efficiency highlights the high efficiency of the\ndistillation fine-tuning method.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11817","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The recent development of deep learning large models in medicine shows
remarkable performance in medical image analysis and diagnosis, but their large
number of parameters causes memory and inference latency challenges. Knowledge
distillation offers a solution, but the slide-level gradients cannot be
backpropagated for student model updates due to high-resolution pathological
images and slide-level labels. This study presents an Efficient Fine-tuning on
Compressed Models (EFCM) framework with two stages: unsupervised feature
distillation and fine-tuning. In the distillation stage, Feature Projection
Distillation (FPD) is proposed with a TransScan module for adaptive receptive
field adjustment to enhance the knowledge absorption capability of the student
model. In the slide-level fine-tuning stage, three strategies (Reuse CLAM,
Retrain CLAM, and End2end Train CLAM (ETC)) are compared. Experiments are
conducted on 11 downstream datasets related to three large medical models:
RETFound for retina, MRM for chest X-ray, and BROW for histopathology. The
experimental results demonstrate that the EFCM framework significantly improves
accuracy and efficiency in handling slide-level pathological image problems,
effectively addressing the challenges of deploying large medical models.
Specifically, it achieves a 4.33% increase in ACC and a 5.2% increase in AUC
compared to the large model BROW on the TCGA-NSCLC and TCGA-BRCA datasets. The
analysis of model inference efficiency highlights the high efficiency of the
distillation fine-tuning method.