HiCervix: An Extensive Hierarchical Dataset and Benchmark for Cervical Cytology Classification

IEEE transactions on medical imaging Pub Date : 2024-06-26 DOI:10.1109/TMI.2024.3419697

De Cai;Jie Chen;Junhan Zhao;Yuan Xue;Sen Yang;Wei Yuan;Min Feng;Haiyan Weng;Shuguang Liu;Yulong Peng;Junyou Zhu;Kanran Wang;Christopher Jackson;Hongping Tang;Junzhou Huang;Xiyue Wang

{"title":"HiCervix: An Extensive Hierarchical Dataset and Benchmark for Cervical Cytology Classification","authors":"De Cai;Jie Chen;Junhan Zhao;Yuan Xue;Sen Yang;Wei Yuan;Min Feng;Haiyan Weng;Shuguang Liu;Yulong Peng;Junyou Zhu;Kanran Wang;Christopher Jackson;Hongping Tang;Junzhou Huang;Xiyue Wang","doi":"10.1109/TMI.2024.3419697","DOIUrl":null,"url":null,"abstract":"Cervical cytology is a critical screening strategy for early detection of pre-cancerous and cancerous cervical lesions. The challenge lies in accurately classifying various cervical cytology cell types. Existing automated cervical cytology methods are primarily trained on databases covering a narrow range of coarse-grained cell types, which fail to provide a comprehensive and detailed performance analysis that accurately represents real-world cytopathology conditions. To overcome these limitations, we introduce HiCervix, the most extensive, multi-center cervical cytology dataset currently available to the public. HiCervix includes 40,229 cervical cells from 4,496 whole slide images, categorized into 29 annotated classes. These classes are organized within a three-level hierarchical tree to capture fine-grained subtype information. To exploit the semantic correlation inherent in this hierarchical tree, we propose HierSwin, a hierarchical vision transformer-based classification network. HierSwin serves as a benchmark for detailed feature learning in both coarse-level and fine-level cervical cancer classification tasks. In our comprehensive experiments, HierSwin demonstrated remarkable performance, achieving 92.08% accuracy for coarse-level classification and 82.93% accuracy averaged across all three levels. When compared to board-certified cytopathologists, HierSwin achieved high classification performance (0.8293 versus 0.7359 averaged accuracy), highlighting its potential for clinical applications. This newly released HiCervix dataset, along with our benchmark HierSwin method, is poised to make a substantial impact on the advancement of deep learning algorithms for rapid cervical cancer screening and greatly improve cancer prevention and patient outcomes in real-world clinical settings.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"43 12","pages":"4344-4355"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on medical imaging","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10571965/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Cervical cytology is a critical screening strategy for early detection of pre-cancerous and cancerous cervical lesions. The challenge lies in accurately classifying various cervical cytology cell types. Existing automated cervical cytology methods are primarily trained on databases covering a narrow range of coarse-grained cell types, which fail to provide a comprehensive and detailed performance analysis that accurately represents real-world cytopathology conditions. To overcome these limitations, we introduce HiCervix, the most extensive, multi-center cervical cytology dataset currently available to the public. HiCervix includes 40,229 cervical cells from 4,496 whole slide images, categorized into 29 annotated classes. These classes are organized within a three-level hierarchical tree to capture fine-grained subtype information. To exploit the semantic correlation inherent in this hierarchical tree, we propose HierSwin, a hierarchical vision transformer-based classification network. HierSwin serves as a benchmark for detailed feature learning in both coarse-level and fine-level cervical cancer classification tasks. In our comprehensive experiments, HierSwin demonstrated remarkable performance, achieving 92.08% accuracy for coarse-level classification and 82.93% accuracy averaged across all three levels. When compared to board-certified cytopathologists, HierSwin achieved high classification performance (0.8293 versus 0.7359 averaged accuracy), highlighting its potential for clinical applications. This newly released HiCervix dataset, along with our benchmark HierSwin method, is poised to make a substantial impact on the advancement of deep learning algorithms for rapid cervical cancer screening and greatly improve cancer prevention and patient outcomes in real-world clinical settings.

查看原文本刊更多论文

HiCervix：宫颈细胞学分类的广泛分层数据集和基准。

宫颈细胞学检查是早期发现宫颈癌前病变和癌变的重要筛查策略。难点在于如何对各种宫颈细胞学细胞类型进行准确分类。现有的自动宫颈细胞学检查方法主要是在覆盖范围较窄的粗粒度细胞类型数据库中进行训练，无法提供全面详细的性能分析，准确反映真实世界的细胞病理学状况。为了克服这些局限性，我们引入了 HiCervix，这是目前可供公众使用的最广泛的多中心宫颈细胞学数据集。HiCervix 包括来自 4,496 张全切片图像的 40,229 个宫颈细胞，分为 29 个注释类别。这些类别以三级分层树的形式组织起来，以捕捉细粒度的亚型信息。为了利用分层树中固有的语义相关性，我们提出了基于分层视觉转换器的分类网络 HierSwin。HierSwin 可作为粗粒度和细粒度宫颈癌分类任务中详细特征学习的基准。在我们的综合实验中，HierSwin 表现出色，粗分类准确率达到 92.08%，三级平均准确率达到 82.93%。与经过认证的细胞病理学家相比，HierSwin 实现了较高的分类性能（0.8293 对 0.7359 的平均准确率），凸显了其在临床应用方面的潜力。新发布的 HiCervix 数据集与我们的基准 HierSwin 方法一起，有望对用于快速宫颈癌筛查的深度学习算法的发展产生重大影响，并大大改善现实世界临床环境中的癌症预防和患者预后。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on medical imaging

自引率

0.00%

发文量