OralEpitheliumDB: A Dataset for Oral Epithelial Dysplasia Image Segmentation and Classification.

Journal of imaging informatics in medicine Pub Date : 2024-08-01 Epub Date: 2024-02-26 DOI:10.1007/s10278-024-01041-w

Adriano Barbosa Silva, Alessandro Santana Martins, Thaína Aparecida Azevedo Tosta, Adriano Mota Loyola, Sérgio Vitorino Cardoso, Leandro Alves Neves, Paulo Rogério de Faria, Marcelo Zanchetta do Nascimento

{"title":"OralEpitheliumDB: A Dataset for Oral Epithelial Dysplasia Image Segmentation and Classification.","authors":"Adriano Barbosa Silva, Alessandro Santana Martins, Thaína Aparecida Azevedo Tosta, Adriano Mota Loyola, Sérgio Vitorino Cardoso, Leandro Alves Neves, Paulo Rogério de Faria, Marcelo Zanchetta do Nascimento","doi":"10.1007/s10278-024-01041-w","DOIUrl":null,"url":null,"abstract":"<p><p>Early diagnosis of potentially malignant disorders, such as oral epithelial dysplasia, is the most reliable way to prevent oral cancer. Computational algorithms have been used as an auxiliary tool to aid specialists in this process. Usually, experiments are performed on private data, making it difficult to reproduce the results. There are several public datasets of histological images, but studies focused on oral dysplasia images use inaccessible datasets. This prevents the improvement of algorithms aimed at this lesion. This study introduces an annotated public dataset of oral epithelial dysplasia tissue images. The dataset includes 456 images acquired from 30 mouse tongues. The images were categorized among the lesion grades, with nuclear structures manually marked by a trained specialist and validated by a pathologist. Also, experiments were carried out in order to illustrate the potential of the proposed dataset in classification and segmentation processes commonly explored in the literature. Convolutional neural network (CNN) models for semantic and instance segmentation were employed on the images, which were pre-processed with stain normalization methods. Then, the segmented and non-segmented images were classified with CNN architectures and machine learning algorithms. The data obtained through these processes is available in the dataset. The segmentation stage showed the F1-score value of 0.83, obtained with the U-Net model using the ResNet-50 as a backbone. At the classification stage, the most expressive result was achieved with the Random Forest method, with an accuracy value of 94.22%. The results show that the segmentation contributed to the classification results, but studies are needed for the improvement of these stages of automated diagnosis. The original, gold standard, normalized, and segmented images are publicly available and may be used for the improvement of clinical applications of CAD methods on oral epithelial dysplasia tissue images.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of imaging informatics in medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10278-024-01041-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/2/26 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Early diagnosis of potentially malignant disorders, such as oral epithelial dysplasia, is the most reliable way to prevent oral cancer. Computational algorithms have been used as an auxiliary tool to aid specialists in this process. Usually, experiments are performed on private data, making it difficult to reproduce the results. There are several public datasets of histological images, but studies focused on oral dysplasia images use inaccessible datasets. This prevents the improvement of algorithms aimed at this lesion. This study introduces an annotated public dataset of oral epithelial dysplasia tissue images. The dataset includes 456 images acquired from 30 mouse tongues. The images were categorized among the lesion grades, with nuclear structures manually marked by a trained specialist and validated by a pathologist. Also, experiments were carried out in order to illustrate the potential of the proposed dataset in classification and segmentation processes commonly explored in the literature. Convolutional neural network (CNN) models for semantic and instance segmentation were employed on the images, which were pre-processed with stain normalization methods. Then, the segmented and non-segmented images were classified with CNN architectures and machine learning algorithms. The data obtained through these processes is available in the dataset. The segmentation stage showed the F1-score value of 0.83, obtained with the U-Net model using the ResNet-50 as a backbone. At the classification stage, the most expressive result was achieved with the Random Forest method, with an accuracy value of 94.22%. The results show that the segmentation contributed to the classification results, but studies are needed for the improvement of these stages of automated diagnosis. The original, gold standard, normalized, and segmented images are publicly available and may be used for the improvement of clinical applications of CAD methods on oral epithelial dysplasia tissue images.

Abstract Image

查看原文本刊更多论文

OralEpitheliumDB：用于口腔上皮发育不良图像分割和分类的数据集。

早期诊断潜在的恶性疾病，如口腔上皮发育不良，是预防口腔癌最可靠的方法。在这一过程中，计算算法被用作辅助工具来帮助专家。通常，实验都是在私人数据上进行的，因此很难复制实验结果。有几个组织学图像的公共数据集，但侧重于口腔发育不良图像的研究使用的是无法访问的数据集。这阻碍了针对这种病变的算法的改进。本研究引入了一个口腔上皮发育不良组织图像注释公共数据集。该数据集包括从 30 只小鼠舌头上获取的 456 幅图像。图像按病变等级分类，核结构由经过培训的专家手动标记，并由病理学家验证。此外，还进行了实验，以说明所提议的数据集在文献中常见的分类和分割过程中的潜力。使用卷积神经网络（CNN）模型对图像进行语义和实例分割，并使用污点归一化方法对图像进行预处理。然后，利用 CNN 架构和机器学习算法对分割图像和非分割图像进行分类。通过这些过程获得的数据可在数据集中找到。在分割阶段，以 ResNet-50 为骨干的 U-Net 模型得到的 F1 分数为 0.83。在分类阶段，使用随机森林方法取得的结果最具表现力，准确率达到 94.22%。结果表明，分割对分类结果做出了贡献，但还需要进行研究，以改进这些阶段的自动诊断。原始图像、金标准图像、归一化图像和分割图像均已公开，可用于改进 CAD 方法在口腔上皮发育不良组织图像上的临床应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of imaging informatics in medicine

自引率

0.00%

发文量