EFCM：压缩模型上的高效微调，用于在医学图像分析中部署大型模型

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI:arxiv-2409.11817

Shaojie Li, Zhaoshuo Diao

{"title":"EFCM：压缩模型上的高效微调，用于在医学图像分析中部署大型模型","authors":"Shaojie Li, Zhaoshuo Diao","doi":"arxiv-2409.11817","DOIUrl":null,"url":null,"abstract":"The recent development of deep learning large models in medicine shows\nremarkable performance in medical image analysis and diagnosis, but their large\nnumber of parameters causes memory and inference latency challenges. Knowledge\ndistillation offers a solution, but the slide-level gradients cannot be\nbackpropagated for student model updates due to high-resolution pathological\nimages and slide-level labels. This study presents an Efficient Fine-tuning on\nCompressed Models (EFCM) framework with two stages: unsupervised feature\ndistillation and fine-tuning. In the distillation stage, Feature Projection\nDistillation (FPD) is proposed with a TransScan module for adaptive receptive\nfield adjustment to enhance the knowledge absorption capability of the student\nmodel. In the slide-level fine-tuning stage, three strategies (Reuse CLAM,\nRetrain CLAM, and End2end Train CLAM (ETC)) are compared. Experiments are\nconducted on 11 downstream datasets related to three large medical models:\nRETFound for retina, MRM for chest X-ray, and BROW for histopathology. The\nexperimental results demonstrate that the EFCM framework significantly improves\naccuracy and efficiency in handling slide-level pathological image problems,\neffectively addressing the challenges of deploying large medical models.\nSpecifically, it achieves a 4.33% increase in ACC and a 5.2% increase in AUC\ncompared to the large model BROW on the TCGA-NSCLC and TCGA-BRCA datasets. The\nanalysis of model inference efficiency highlights the high efficiency of the\ndistillation fine-tuning method.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":"15 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"EFCM: Efficient Fine-tuning on Compressed Models for deployment of large models in medical image analysis\",\"authors\":\"Shaojie Li, Zhaoshuo Diao\",\"doi\":\"arxiv-2409.11817\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The recent development of deep learning large models in medicine shows\\nremarkable performance in medical image analysis and diagnosis, but their large\\nnumber of parameters causes memory and inference latency challenges. Knowledge\\ndistillation offers a solution, but the slide-level gradients cannot be\\nbackpropagated for student model updates due to high-resolution pathological\\nimages and slide-level labels. This study presents an Efficient Fine-tuning on\\nCompressed Models (EFCM) framework with two stages: unsupervised feature\\ndistillation and fine-tuning. In the distillation stage, Feature Projection\\nDistillation (FPD) is proposed with a TransScan module for adaptive receptive\\nfield adjustment to enhance the knowledge absorption capability of the student\\nmodel. In the slide-level fine-tuning stage, three strategies (Reuse CLAM,\\nRetrain CLAM, and End2end Train CLAM (ETC)) are compared. Experiments are\\nconducted on 11 downstream datasets related to three large medical models:\\nRETFound for retina, MRM for chest X-ray, and BROW for histopathology. The\\nexperimental results demonstrate that the EFCM framework significantly improves\\naccuracy and efficiency in handling slide-level pathological image problems,\\neffectively addressing the challenges of deploying large medical models.\\nSpecifically, it achieves a 4.33% increase in ACC and a 5.2% increase in AUC\\ncompared to the large model BROW on the TCGA-NSCLC and TCGA-BRCA datasets. The\\nanalysis of model inference efficiency highlights the high efficiency of the\\ndistillation fine-tuning method.\",\"PeriodicalId\":501130,\"journal\":{\"name\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"volume\":\"15 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11817\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11817","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

近年来，医学领域深度学习大型模型的发展在医学图像分析和诊断中表现出了显著的性能，但其参数数量之大导致了内存和推理延迟方面的挑战。Knowledgedistillation 提供了一种解决方案，但由于高分辨率病理图像和幻灯片级标签，学生模型更新时无法回传幻灯片级梯度。本研究提出的压缩模型高效微调（EFCM）框架包含两个阶段：无监督特征蒸馏和微调。在蒸馏阶段，提出了特征投影蒸馏（Feature ProjectionDistillation，FPD），并使用 TransScan 模块进行自适应感受野调整，以增强学生模型的知识吸收能力。在滑动微调阶段，比较了三种策略（重用 CLAM、重新训练 CLAM 和端对端训练 CLAM (ETC)）。实验在三个大型医学模型的 11 个下游数据集上进行：视网膜模型 RETFound、胸部 X 光模型 MRM 和组织病理学模型 BROW。实验结果表明，EFCM 框架显著提高了处理幻灯片级病理图像问题的准确性和效率，有效解决了部署大型医学模型所面临的挑战。具体来说，在 TCGA-NSCLC 和 TCGA-BRCA 数据集上，与大型模型 BROW 相比，EFCM 框架的 ACC 提高了 4.33%，AUC 提高了 5.2%。对模型推断效率的分析凸显了蒸馏微调方法的高效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

EFCM: Efficient Fine-tuning on Compressed Models for deployment of large models in medical image analysis

The recent development of deep learning large models in medicine shows remarkable performance in medical image analysis and diagnosis, but their large number of parameters causes memory and inference latency challenges. Knowledge distillation offers a solution, but the slide-level gradients cannot be backpropagated for student model updates due to high-resolution pathological images and slide-level labels. This study presents an Efficient Fine-tuning on Compressed Models (EFCM) framework with two stages: unsupervised feature distillation and fine-tuning. In the distillation stage, Feature Projection Distillation (FPD) is proposed with a TransScan module for adaptive receptive field adjustment to enhance the knowledge absorption capability of the student model. In the slide-level fine-tuning stage, three strategies (Reuse CLAM, Retrain CLAM, and End2end Train CLAM (ETC)) are compared. Experiments are conducted on 11 downstream datasets related to three large medical models: RETFound for retina, MRM for chest X-ray, and BROW for histopathology. The experimental results demonstrate that the EFCM framework significantly improves accuracy and efficiency in handling slide-level pathological image problems, effectively addressing the challenges of deploying large medical models. Specifically, it achieves a 4.33% increase in ACC and a 5.2% increase in AUC compared to the large model BROW on the TCGA-NSCLC and TCGA-BRCA datasets. The analysis of model inference efficiency highlights the high efficiency of the distillation fine-tuning method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Computer Vision and Pattern Recognition

自引率

0.00%

发文量