Benchmarking Deep Learning Models for Tooth Structure Segmentation.

Q1 Social Sciences
Gender and Development Pub Date : 2022-10-01 Epub Date: 2022-06-09 DOI:10.1177/00220345221100169
L Schneider, L Arsiwala-Scheppach, J Krois, H Meyer-Lueckel, K K Bressem, S M Niehues, F Schwendicke
{"title":"Benchmarking Deep Learning Models for Tooth Structure Segmentation.","authors":"L Schneider, L Arsiwala-Scheppach, J Krois, H Meyer-Lueckel, K K Bressem, S M Niehues, F Schwendicke","doi":"10.1177/00220345221100169","DOIUrl":null,"url":null,"abstract":"<p><p>A wide range of deep learning (DL) architectures with varying depths are available, with developers usually choosing one or a few of them for their specific task in a nonsystematic way. Benchmarking (i.e., the systematic comparison of state-of-the art architectures on a specific task) may provide guidance in the model development process and may allow developers to make better decisions. However, comprehensive benchmarking has not been performed in dentistry yet. We aimed to benchmark a range of architecture designs for 1 specific, exemplary case: tooth structure segmentation on dental bitewing radiographs. We built 72 models for tooth structure (enamel, dentin, pulp, fillings, crowns) segmentation by combining 6 different DL network architectures (U-Net, U-Net++, Feature Pyramid Networks, LinkNet, Pyramid Scene Parsing Network, Mask Attention Network) with 12 encoders from 3 different encoder families (ResNet, VGG, DenseNet) of varying depth (e.g., VGG13, VGG16, VGG19). On each model design, 3 initialization strategies (ImageNet, CheXpert, random initialization) were applied, resulting overall into 216 trained models, which were trained up to 200 epochs with the Adam optimizer (learning rate = 0.0001) and a batch size of 32. Our data set consisted of 1,625 human-annotated dental bitewing radiographs. We used a 5-fold cross-validation scheme and quantified model performances primarily by the F1-score. Initialization with ImageNet or CheXpert weights significantly outperformed random initialization (<i>P</i> < 0.05). Deeper and more complex models did not necessarily perform better than less complex alternatives. VGG-based models were more robust across model configurations, while more complex models (e.g., from the ResNet family) achieved peak performances. In conclusion, initializing models with pretrained weights may be recommended when training models for dental radiographic analysis. Less complex model architectures may be competitive alternatives if computational resources and training time are restricting factors. Models developed and found superior on nondental data sets may not show this behavior for dental domain-specific tasks.</p>","PeriodicalId":35882,"journal":{"name":"Gender and Development","volume":"9 1","pages":"1343-1349"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9516600/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Gender and Development","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/00220345221100169","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/6/9 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 0

Abstract

A wide range of deep learning (DL) architectures with varying depths are available, with developers usually choosing one or a few of them for their specific task in a nonsystematic way. Benchmarking (i.e., the systematic comparison of state-of-the art architectures on a specific task) may provide guidance in the model development process and may allow developers to make better decisions. However, comprehensive benchmarking has not been performed in dentistry yet. We aimed to benchmark a range of architecture designs for 1 specific, exemplary case: tooth structure segmentation on dental bitewing radiographs. We built 72 models for tooth structure (enamel, dentin, pulp, fillings, crowns) segmentation by combining 6 different DL network architectures (U-Net, U-Net++, Feature Pyramid Networks, LinkNet, Pyramid Scene Parsing Network, Mask Attention Network) with 12 encoders from 3 different encoder families (ResNet, VGG, DenseNet) of varying depth (e.g., VGG13, VGG16, VGG19). On each model design, 3 initialization strategies (ImageNet, CheXpert, random initialization) were applied, resulting overall into 216 trained models, which were trained up to 200 epochs with the Adam optimizer (learning rate = 0.0001) and a batch size of 32. Our data set consisted of 1,625 human-annotated dental bitewing radiographs. We used a 5-fold cross-validation scheme and quantified model performances primarily by the F1-score. Initialization with ImageNet or CheXpert weights significantly outperformed random initialization (P < 0.05). Deeper and more complex models did not necessarily perform better than less complex alternatives. VGG-based models were more robust across model configurations, while more complex models (e.g., from the ResNet family) achieved peak performances. In conclusion, initializing models with pretrained weights may be recommended when training models for dental radiographic analysis. Less complex model architectures may be competitive alternatives if computational resources and training time are restricting factors. Models developed and found superior on nondental data sets may not show this behavior for dental domain-specific tasks.

为牙齿结构分割的深度学习模型设定基准。
目前有多种深度不同的深度学习(DL)架构可供选择,开发人员通常会以非系统的方式选择其中一种或几种架构来完成特定任务。基准测试(即在特定任务上对最先进的架构进行系统比较)可为模型开发过程提供指导,并让开发人员做出更好的决策。然而,牙科领域尚未进行过全面的基准测试。我们的目标是针对一个具体的示例对一系列架构设计进行基准测试:牙科咬翼X光片上的牙齿结构分割。我们通过将 6 种不同的 DL 网络架构(U-Net、U-Net++、特征金字塔网络、LinkNet、金字塔场景解析网络、掩码注意网络)与 3 个不同编码器系列(ResNet、VGG、DenseNet)中不同深度(如 VGG13、VGG16、VGG19)的 12 个编码器相结合,建立了 72 个牙齿结构(珐琅质、牙本质、牙髓、填充物、牙冠)分割模型。在每个模型设计中,我们应用了 3 种初始化策略(ImageNet、CheXpert、随机初始化),最终共训练出 216 个模型,这些模型在 Adam 优化器(学习率 = 0.0001)和 32 个批次大小的作用下训练了 200 个历时。我们的数据集由 1,625 张人类标注的牙科咬翼X光片组成。我们采用了 5 倍交叉验证方案,主要通过 F1 分数来量化模型性能。使用 ImageNet 或 CheXpert 权重初始化的效果明显优于随机初始化(P < 0.05)。更深、更复杂的模型并不一定比不那么复杂的模型表现更好。基于 VGG 的模型在不同的模型配置下更稳健,而更复杂的模型(如 ResNet 系列)则达到了峰值性能。总之,在训练用于牙科放射学分析的模型时,建议使用预训练权重初始化模型。如果计算资源和训练时间是限制因素,不太复杂的模型架构可能是有竞争力的替代方案。在非牙科数据集上开发并发现优越性的模型,在牙科领域的特定任务中可能不会表现出这种行为。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Gender and Development
Gender and Development Social Sciences-Gender Studies
CiteScore
2.10
自引率
0.00%
发文量
25
期刊介绍: Since 1993, Gender & Development has aimed to promote, inspire, and support development policy and practice, which furthers the goal of equality between women and men. This journal has a readership in over 90 countries and uses clear accessible language. Each issue of Gender & Development focuses on a topic of key interest to all involved in promoting gender equality through development. An up-to-the minute overview of the topic is followed by a range of articles from researchers, policy makers, and practitioners. Insights from development initiatives across the world are shared and analysed, and lessons identified. Innovative theoretical concepts are explored by key academic writers, and the uses of these concepts for policy and practice are explored.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信