{"title":"Beyond Foundation Models: Distilling Geometric Priors for Lightweight Monocular Depth Estimation in Endoscopy.","authors":"Kejin Zhu, Shuwei Shao, Yongming Yang, Zhongyu Tian, Baochang Zhang, Zhe Min","doi":"10.1109/TMI.2026.3690379","DOIUrl":null,"url":null,"abstract":"<p><p>In recent times, geometric foundation models have demonstrated remarkable performance in depth estimation tasks, benefiting from exposure to large-scale data that enables the learning of intricate geometric structures and spatial dependencies. However, their large parameter sizes and high computational complexity pose significant challenges in meeting the efficiency requirements of downstream surgical applications. Consequently, the design of a high-performance yet lightweight monocular depth estimator has become a focal point of research. To this end, we harness the rich geometric priors encoded in geometric foundation models and introduce a novel trinity distillation scheme that transfers geometric knowledge across three complementary dimensions, namely spatial, spectral and gradient, into a compact depth estimator. To further enhance prediction quality, we develop a semantic distribution alignment strategy to effectively suppress pseudo-texture artifacts arising from the limited semantic representation capability of the lightweight estimator. Extensive experiments on the SCARED, SERV-CT, Hamlyn, and C3VD datasets demonstrate that the proposed method either surpasses or achieves comparable performance to previous state-of-the-art competitors, with a smaller model size and reduced computational overhead. Code will be available at: https://github.com/ShuweiShao/LiteNet.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2026-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on medical imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TMI.2026.3690379","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In recent times, geometric foundation models have demonstrated remarkable performance in depth estimation tasks, benefiting from exposure to large-scale data that enables the learning of intricate geometric structures and spatial dependencies. However, their large parameter sizes and high computational complexity pose significant challenges in meeting the efficiency requirements of downstream surgical applications. Consequently, the design of a high-performance yet lightweight monocular depth estimator has become a focal point of research. To this end, we harness the rich geometric priors encoded in geometric foundation models and introduce a novel trinity distillation scheme that transfers geometric knowledge across three complementary dimensions, namely spatial, spectral and gradient, into a compact depth estimator. To further enhance prediction quality, we develop a semantic distribution alignment strategy to effectively suppress pseudo-texture artifacts arising from the limited semantic representation capability of the lightweight estimator. Extensive experiments on the SCARED, SERV-CT, Hamlyn, and C3VD datasets demonstrate that the proposed method either surpasses or achieves comparable performance to previous state-of-the-art competitors, with a smaller model size and reduced computational overhead. Code will be available at: https://github.com/ShuweiShao/LiteNet.