Beyond Foundation Models: Distilling Geometric Priors for Lightweight Monocular Depth Estimation in Endoscopy.

IEEE transactions on medical imaging Pub Date : 2026-05-05 DOI:10.1109/TMI.2026.3690379

Kejin Zhu, Shuwei Shao, Yongming Yang, Zhongyu Tian, Baochang Zhang, Zhe Min

{"title":"Beyond Foundation Models: Distilling Geometric Priors for Lightweight Monocular Depth Estimation in Endoscopy.","authors":"Kejin Zhu, Shuwei Shao, Yongming Yang, Zhongyu Tian, Baochang Zhang, Zhe Min","doi":"10.1109/TMI.2026.3690379","DOIUrl":null,"url":null,"abstract":"<p><p>In recent times, geometric foundation models have demonstrated remarkable performance in depth estimation tasks, benefiting from exposure to large-scale data that enables the learning of intricate geometric structures and spatial dependencies. However, their large parameter sizes and high computational complexity pose significant challenges in meeting the efficiency requirements of downstream surgical applications. Consequently, the design of a high-performance yet lightweight monocular depth estimator has become a focal point of research. To this end, we harness the rich geometric priors encoded in geometric foundation models and introduce a novel trinity distillation scheme that transfers geometric knowledge across three complementary dimensions, namely spatial, spectral and gradient, into a compact depth estimator. To further enhance prediction quality, we develop a semantic distribution alignment strategy to effectively suppress pseudo-texture artifacts arising from the limited semantic representation capability of the lightweight estimator. Extensive experiments on the SCARED, SERV-CT, Hamlyn, and C3VD datasets demonstrate that the proposed method either surpasses or achieves comparable performance to previous state-of-the-art competitors, with a smaller model size and reduced computational overhead. Code will be available at: https://github.com/ShuweiShao/LiteNet.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2026-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on medical imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TMI.2026.3690379","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In recent times, geometric foundation models have demonstrated remarkable performance in depth estimation tasks, benefiting from exposure to large-scale data that enables the learning of intricate geometric structures and spatial dependencies. However, their large parameter sizes and high computational complexity pose significant challenges in meeting the efficiency requirements of downstream surgical applications. Consequently, the design of a high-performance yet lightweight monocular depth estimator has become a focal point of research. To this end, we harness the rich geometric priors encoded in geometric foundation models and introduce a novel trinity distillation scheme that transfers geometric knowledge across three complementary dimensions, namely spatial, spectral and gradient, into a compact depth estimator. To further enhance prediction quality, we develop a semantic distribution alignment strategy to effectively suppress pseudo-texture artifacts arising from the limited semantic representation capability of the lightweight estimator. Extensive experiments on the SCARED, SERV-CT, Hamlyn, and C3VD datasets demonstrate that the proposed method either surpasses or achieves comparable performance to previous state-of-the-art competitors, with a smaller model size and reduced computational overhead. Code will be available at: https://github.com/ShuweiShao/LiteNet.

查看原文本刊更多论文

超越基础模型：提取内窥镜中轻量级单目深度估计的几何先验。

近年来，几何基础模型在深度估计任务中表现出了显著的性能，得益于暴露于能够学习复杂几何结构和空间依赖性的大规模数据。然而，它们的大参数尺寸和高计算复杂度对满足下游外科应用的效率要求提出了重大挑战。因此，设计高性能、轻量化的单目深度估计器已成为研究的热点。为此，我们利用几何基础模型中编码的丰富几何先验，并引入了一种新的三位一体蒸馏方案，该方案将几何知识跨三个互补维度（即空间、光谱和梯度）转移到紧凑的深度估计器中。为了进一步提高预测质量，我们开发了一种语义分布对齐策略，以有效地抑制由轻量级估计器有限的语义表示能力引起的伪纹理伪影。在SCARED、SERV-CT、Hamlyn和C3VD数据集上进行的大量实验表明，该方法的性能优于或达到了与之前最先进的竞争对手相当的水平，模型尺寸更小，计算开销更低。代码将在https://github.com/ShuweiShao/LiteNet上提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on medical imaging

自引率

0.00%

发文量