使用直接坐标预测的视觉变压器进行头部测量地标检测。

IF 2.1 2区 医学 Q2 DENTISTRY, ORAL SURGERY & MEDICINE
Filipe Laitenberger , Hannah T. Scheuer , Hanna A. Scheuer , Enno Lilienthal , Shaodi You , Reinhard E. Friedrich
{"title":"使用直接坐标预测的视觉变压器进行头部测量地标检测。","authors":"Filipe Laitenberger ,&nbsp;Hannah T. Scheuer ,&nbsp;Hanna A. Scheuer ,&nbsp;Enno Lilienthal ,&nbsp;Shaodi You ,&nbsp;Reinhard E. Friedrich","doi":"10.1016/j.jcms.2025.05.021","DOIUrl":null,"url":null,"abstract":"<div><div>Cephalometric Landmark Detection (CLD), i.e. annotating interest points in lateral X-ray images, is the crucial first step of every orthodontic therapy. While CLD has immense potential for automation using Deep Learning methods, carefully crafted contemporary approaches using convolutional neural networks and heatmap prediction do not qualify for large-scale clinical application due to insufficient performance. We propose a novel approach using Vision Transformers (ViTs) with direct coordinate prediction, avoiding the memory-intensive heatmap prediction common in previous work. Through extensive ablation studies comparing our method against contemporary CNN architectures (ConvNext V2) and heatmap-based approaches (Segformer), we demonstrate that ViTs with coordinate prediction achieve superior performance with more than 2 mm improvement in mean radial error compared to state-of-the-art CLD methods. Our results show that while non-adapted CNN architectures perform poorly on the given task, contemporary approaches may be too tailored to specific datasets, failing to generalize to different and especially sparse datasets. We conclude that using general-purpose Vision Transformers with direct coordinate prediction shows great promise for future research on CLD and medical computer vision.</div></div>","PeriodicalId":54851,"journal":{"name":"Journal of Cranio-Maxillofacial Surgery","volume":"53 9","pages":"Pages 1518-1529"},"PeriodicalIF":2.1000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cephalometric landmark detection using vision transformers with direct coordinate prediction\",\"authors\":\"Filipe Laitenberger ,&nbsp;Hannah T. Scheuer ,&nbsp;Hanna A. Scheuer ,&nbsp;Enno Lilienthal ,&nbsp;Shaodi You ,&nbsp;Reinhard E. Friedrich\",\"doi\":\"10.1016/j.jcms.2025.05.021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Cephalometric Landmark Detection (CLD), i.e. annotating interest points in lateral X-ray images, is the crucial first step of every orthodontic therapy. While CLD has immense potential for automation using Deep Learning methods, carefully crafted contemporary approaches using convolutional neural networks and heatmap prediction do not qualify for large-scale clinical application due to insufficient performance. We propose a novel approach using Vision Transformers (ViTs) with direct coordinate prediction, avoiding the memory-intensive heatmap prediction common in previous work. Through extensive ablation studies comparing our method against contemporary CNN architectures (ConvNext V2) and heatmap-based approaches (Segformer), we demonstrate that ViTs with coordinate prediction achieve superior performance with more than 2 mm improvement in mean radial error compared to state-of-the-art CLD methods. Our results show that while non-adapted CNN architectures perform poorly on the given task, contemporary approaches may be too tailored to specific datasets, failing to generalize to different and especially sparse datasets. We conclude that using general-purpose Vision Transformers with direct coordinate prediction shows great promise for future research on CLD and medical computer vision.</div></div>\",\"PeriodicalId\":54851,\"journal\":{\"name\":\"Journal of Cranio-Maxillofacial Surgery\",\"volume\":\"53 9\",\"pages\":\"Pages 1518-1529\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cranio-Maxillofacial Surgery\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1010518225001866\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cranio-Maxillofacial Surgery","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1010518225001866","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0

摘要

头颅测量地标检测(CLD),即在侧位x线图像中标注兴趣点,是每一种正畸治疗的关键第一步。虽然CLD在使用深度学习方法实现自动化方面具有巨大潜力,但由于性能不足,使用卷积神经网络和热图预测的精心设计的现代方法不适合大规模临床应用。我们提出了一种使用视觉变换(ViTs)直接坐标预测的新方法,避免了以往工作中常见的内存密集型热图预测。通过将我们的方法与当代CNN架构(ConvNext V2)和基于热图的方法(Segformer)进行广泛的消融研究,我们证明,与最先进的CLD方法相比,具有坐标预测的ViTs实现了卓越的性能,平均径向误差提高了2毫米以上。我们的研究结果表明,虽然非自适应CNN架构在给定任务上表现不佳,但当代方法可能过于适合特定的数据集,无法推广到不同的,特别是稀疏的数据集。我们认为,使用具有直接坐标预测功能的通用视觉变压器对CLD和医疗计算机视觉的未来研究具有很大的前景。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Cephalometric landmark detection using vision transformers with direct coordinate prediction
Cephalometric Landmark Detection (CLD), i.e. annotating interest points in lateral X-ray images, is the crucial first step of every orthodontic therapy. While CLD has immense potential for automation using Deep Learning methods, carefully crafted contemporary approaches using convolutional neural networks and heatmap prediction do not qualify for large-scale clinical application due to insufficient performance. We propose a novel approach using Vision Transformers (ViTs) with direct coordinate prediction, avoiding the memory-intensive heatmap prediction common in previous work. Through extensive ablation studies comparing our method against contemporary CNN architectures (ConvNext V2) and heatmap-based approaches (Segformer), we demonstrate that ViTs with coordinate prediction achieve superior performance with more than 2 mm improvement in mean radial error compared to state-of-the-art CLD methods. Our results show that while non-adapted CNN architectures perform poorly on the given task, contemporary approaches may be too tailored to specific datasets, failing to generalize to different and especially sparse datasets. We conclude that using general-purpose Vision Transformers with direct coordinate prediction shows great promise for future research on CLD and medical computer vision.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.20
自引率
22.60%
发文量
117
审稿时长
70 days
期刊介绍: The Journal of Cranio-Maxillofacial Surgery publishes articles covering all aspects of surgery of the head, face and jaw. Specific topics covered recently have included: • Distraction osteogenesis • Synthetic bone substitutes • Fibroblast growth factors • Fetal wound healing • Skull base surgery • Computer-assisted surgery • Vascularized bone grafts
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信