Publicly Available Dental Image Datasets for Artificial Intelligence.

IF 5.9 1区医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE

Journal of Dental Research Pub Date : 2024-10-18 DOI:10.1177/00220345241272052

S E Uribe,J Issa,F Sohrabniya,A Denny,N N Kim,A F Dayo,A Chaurasia,A Sofi-Mahmudi,M Büttner,F Schwendicke

{"title":"Publicly Available Dental Image Datasets for Artificial Intelligence.","authors":"S E Uribe,J Issa,F Sohrabniya,A Denny,N N Kim,A F Dayo,A Chaurasia,A Sofi-Mahmudi,M Büttner,F Schwendicke","doi":"10.1177/00220345241272052","DOIUrl":null,"url":null,"abstract":"The development of artificial intelligence (AI) in dentistry requires large and well-annotated datasets. However, the availability of public dental imaging datasets remains unclear. This study aimed to provide a comprehensive overview of all publicly available dental imaging datasets to address this gap and support AI development. This observational study searched all publicly available dataset resources (academic databases, preprints, and AI challenges), focusing on datasets/articles from 2020 to 2023, with PubMed searches extending back to 2011. We comprehensively searched for dental AI datasets containing images (intraoral photos, scans, radiographs, etc.) using relevant keywords. We included datasets of >50 images obtained from publicly available sources. We extracted dataset characteristics, patient demographics, country of origin, dataset size, ethical clearance, image details, FAIRness metrics, and metadata completeness. We screened 131,028 records and extracted 16 unique dental imaging datasets. The datasets were obtained from Kaggle (18.8%), GitHub, Google, Mendeley, PubMed, Zenodo (each 12.5%), Grand-Challenge, OSF, and arXiv (each 6.25%). The primary focus was tooth segmentation (62.5%) and labeling (56.2%). Panoramic radiography was the most common imaging modality (58.8%). Of the 13 countries, China contributed the most images (2,413). Of the datasets, 75% contained annotations, whereas the methods used to establish labels were often unclear and inconsistent. Only 31.2% of the datasets reported ethical approval, and 56.25% did not specify a license. Most data were obtained from dental clinics (50%). Intraoral radiographs had the highest findability score in the FAIR assessment, whereas cone-beam computed tomography datasets scored the lowest in all categories. These findings revealed a scarcity of publicly available imaging dental data and inconsistent metadata reporting. To promote the development of robust, equitable, and generalizable AI tools for dental diagnostics, treatment, and research, efforts are needed to address data scarcity, increase diversity, mandate metadata completeness, and ensure FAIRness in AI dental imaging research.","PeriodicalId":15596,"journal":{"name":"Journal of Dental Research","volume":"44 1","pages":"220345241272052"},"PeriodicalIF":5.9000,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Dental Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/00220345241272052","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

The development of artificial intelligence (AI) in dentistry requires large and well-annotated datasets. However, the availability of public dental imaging datasets remains unclear. This study aimed to provide a comprehensive overview of all publicly available dental imaging datasets to address this gap and support AI development. This observational study searched all publicly available dataset resources (academic databases, preprints, and AI challenges), focusing on datasets/articles from 2020 to 2023, with PubMed searches extending back to 2011. We comprehensively searched for dental AI datasets containing images (intraoral photos, scans, radiographs, etc.) using relevant keywords. We included datasets of >50 images obtained from publicly available sources. We extracted dataset characteristics, patient demographics, country of origin, dataset size, ethical clearance, image details, FAIRness metrics, and metadata completeness. We screened 131,028 records and extracted 16 unique dental imaging datasets. The datasets were obtained from Kaggle (18.8%), GitHub, Google, Mendeley, PubMed, Zenodo (each 12.5%), Grand-Challenge, OSF, and arXiv (each 6.25%). The primary focus was tooth segmentation (62.5%) and labeling (56.2%). Panoramic radiography was the most common imaging modality (58.8%). Of the 13 countries, China contributed the most images (2,413). Of the datasets, 75% contained annotations, whereas the methods used to establish labels were often unclear and inconsistent. Only 31.2% of the datasets reported ethical approval, and 56.25% did not specify a license. Most data were obtained from dental clinics (50%). Intraoral radiographs had the highest findability score in the FAIR assessment, whereas cone-beam computed tomography datasets scored the lowest in all categories. These findings revealed a scarcity of publicly available imaging dental data and inconsistent metadata reporting. To promote the development of robust, equitable, and generalizable AI tools for dental diagnostics, treatment, and research, efforts are needed to address data scarcity, increase diversity, mandate metadata completeness, and ensure FAIRness in AI dental imaging research.

查看原文本刊更多论文

用于人工智能的公开牙科图像数据集。

牙科人工智能（AI）的发展需要大量的、有良好标注的数据集。然而，公共牙科成像数据集的可用性仍不明确。本研究旨在全面概述所有公开可用的牙科成像数据集，以填补这一空白并支持人工智能的发展。这项观察性研究搜索了所有公开可用的数据集资源（学术数据库、预印本和人工智能挑战），重点关注2020年至2023年的数据集/文章，PubMed搜索可追溯到2011年。我们使用相关关键词全面搜索了包含图像（口内照片、扫描、X光片等）的牙科人工智能数据集。我们纳入了从公开来源获得的大于 50 幅图像的数据集。我们提取了数据集的特征、患者人口统计学特征、来源国、数据集大小、伦理许可、图像细节、FAIRness 指标和元数据完整性。我们筛选了 131,028 条记录，提取了 16 个独特的牙科成像数据集。这些数据集来自 Kaggle（18.8%）、GitHub、Google、Mendeley、PubMed、Zenodo（各占 12.5%）、Grand-Challenge、OSF 和 arXiv（各占 6.25%）。主要重点是牙齿分割（62.5%）和标记（56.2%）。全景放射摄影是最常见的成像方式（58.8%）。在 13 个国家中，中国提供的图像最多（2,413 幅）。75%的数据集包含注释，但用于建立标签的方法往往不明确且不一致。只有 31.2% 的数据集报告了伦理批准，56.25% 的数据集没有说明许可证。大多数数据来自牙科诊所（50%）。口内X光片在 FAIR 评估中的可查找性得分最高，而锥形束计算机断层扫描数据集在所有类别中得分最低。这些发现揭示了公开可用的牙科成像数据稀缺以及元数据报告不一致的问题。为了促进开发用于牙科诊断、治疗和研究的强大、公平和可推广的人工智能工具，需要努力解决数据稀缺问题、增加多样性、规定元数据的完整性并确保人工智能牙科成像研究的 FAIR 性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Dental Research 医学-牙科与口腔外科

CiteScore

15.30

自引率

3.90%

发文量

155

审稿时长

3-8 weeks

期刊介绍： The Journal of Dental Research (JDR) is a peer-reviewed scientific journal committed to sharing new knowledge and information on all sciences related to dentistry and the oral cavity, covering health and disease. With monthly publications, JDR ensures timely communication of the latest research to the oral and dental community.