Beyond Foundation Models: Distilling Geometric Priors for Lightweight Monocular Depth Estimation in Endoscopy.

Kejin Zhu, Shuwei Shao, Yongming Yang, Zhongyu Tian, Baochang Zhang, Zhe Min
{"title":"Beyond Foundation Models: Distilling Geometric Priors for Lightweight Monocular Depth Estimation in Endoscopy.","authors":"Kejin Zhu, Shuwei Shao, Yongming Yang, Zhongyu Tian, Baochang Zhang, Zhe Min","doi":"10.1109/TMI.2026.3690379","DOIUrl":null,"url":null,"abstract":"<p><p>In recent times, geometric foundation models have demonstrated remarkable performance in depth estimation tasks, benefiting from exposure to large-scale data that enables the learning of intricate geometric structures and spatial dependencies. However, their large parameter sizes and high computational complexity pose significant challenges in meeting the efficiency requirements of downstream surgical applications. Consequently, the design of a high-performance yet lightweight monocular depth estimator has become a focal point of research. To this end, we harness the rich geometric priors encoded in geometric foundation models and introduce a novel trinity distillation scheme that transfers geometric knowledge across three complementary dimensions, namely spatial, spectral and gradient, into a compact depth estimator. To further enhance prediction quality, we develop a semantic distribution alignment strategy to effectively suppress pseudo-texture artifacts arising from the limited semantic representation capability of the lightweight estimator. Extensive experiments on the SCARED, SERV-CT, Hamlyn, and C3VD datasets demonstrate that the proposed method either surpasses or achieves comparable performance to previous state-of-the-art competitors, with a smaller model size and reduced computational overhead. Code will be available at: https://github.com/ShuweiShao/LiteNet.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2026-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on medical imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TMI.2026.3690379","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In recent times, geometric foundation models have demonstrated remarkable performance in depth estimation tasks, benefiting from exposure to large-scale data that enables the learning of intricate geometric structures and spatial dependencies. However, their large parameter sizes and high computational complexity pose significant challenges in meeting the efficiency requirements of downstream surgical applications. Consequently, the design of a high-performance yet lightweight monocular depth estimator has become a focal point of research. To this end, we harness the rich geometric priors encoded in geometric foundation models and introduce a novel trinity distillation scheme that transfers geometric knowledge across three complementary dimensions, namely spatial, spectral and gradient, into a compact depth estimator. To further enhance prediction quality, we develop a semantic distribution alignment strategy to effectively suppress pseudo-texture artifacts arising from the limited semantic representation capability of the lightweight estimator. Extensive experiments on the SCARED, SERV-CT, Hamlyn, and C3VD datasets demonstrate that the proposed method either surpasses or achieves comparable performance to previous state-of-the-art competitors, with a smaller model size and reduced computational overhead. Code will be available at: https://github.com/ShuweiShao/LiteNet.

超越基础模型:提取内窥镜中轻量级单目深度估计的几何先验。
近年来,几何基础模型在深度估计任务中表现出了显著的性能,得益于暴露于能够学习复杂几何结构和空间依赖性的大规模数据。然而,它们的大参数尺寸和高计算复杂度对满足下游外科应用的效率要求提出了重大挑战。因此,设计高性能、轻量化的单目深度估计器已成为研究的热点。为此,我们利用几何基础模型中编码的丰富几何先验,并引入了一种新的三位一体蒸馏方案,该方案将几何知识跨三个互补维度(即空间、光谱和梯度)转移到紧凑的深度估计器中。为了进一步提高预测质量,我们开发了一种语义分布对齐策略,以有效地抑制由轻量级估计器有限的语义表示能力引起的伪纹理伪影。在SCARED、SERV-CT、Hamlyn和C3VD数据集上进行的大量实验表明,该方法的性能优于或达到了与之前最先进的竞争对手相当的水平,模型尺寸更小,计算开销更低。代码将在https://github.com/ShuweiShao/LiteNet上提供。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书