A novel dataset for nuclei and tissue segmentation in melanoma with baseline nuclei segmentation and tissue segmentation benchmarks.

IF 11.8 2区生物学 Q1 MULTIDISCIPLINARY SCIENCES

GigaScience Pub Date : 2025-01-06 DOI:10.1093/gigascience/giaf011

Mark Schuiveling, Hong Liu, Daniel Eek, Gerben E Breimer, Karijn P M Suijkerbuijk, Willeke A M Blokx, Mitko Veta

{"title":"A novel dataset for nuclei and tissue segmentation in melanoma with baseline nuclei segmentation and tissue segmentation benchmarks.","authors":"Mark Schuiveling, Hong Liu, Daniel Eek, Gerben E Breimer, Karijn P M Suijkerbuijk, Willeke A M Blokx, Mitko Veta","doi":"10.1093/gigascience/giaf011","DOIUrl":null,"url":null,"abstract":"Background: Melanoma is an aggressive form of skin cancer in which tumor-infiltrating lymphocytes (TILs) are a biomarker for recurrence and treatment response. Manual TIL assessment is prone to interobserver variability, and current deep learning models are not publicly accessible or have low performance. Deep learning models, however, have the potential of consistent spatial evaluation of TILs and other immune cell subsets with the potential of improved prognostic and predictive value. To make the development of these models possible, we created the Panoptic Segmentation of nUclei and tissue in advanced MelanomA (PUMA) dataset and assessed the performance of several state-of-the-art deep learning models. In addition, we show how to improve model performance further by using heuristic postprocessing in which nuclei classes are updated based on their tissue localization.Results: The PUMA dataset includes 155 primary and 155 metastatic melanoma hematoxylin and eosin-stained regions of interest with nuclei and tissue annotations from a single melanoma referral institution. The Hover-NeXt model, trained on the PUMA dataset, demonstrated the best performance for lymphocyte detection, approaching human interobserver agreement. In addition, heuristic postprocessing of deep learning models improved the detection of noncommon classes, such as epithelial nuclei.Conclusion: The PUMA dataset is the first melanoma-specific dataset that can be used to develop melanoma-specific nuclei and tissue segmentation models. These models can, in turn, be used for prognostic and predictive biomarker development. Incorporating tissue and nuclei segmentation is a step toward improved deep learning nuclei segmentation performance. To support the development of these models, this dataset is used in the PUMA challenge.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11837757/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giaf011","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Melanoma is an aggressive form of skin cancer in which tumor-infiltrating lymphocytes (TILs) are a biomarker for recurrence and treatment response. Manual TIL assessment is prone to interobserver variability, and current deep learning models are not publicly accessible or have low performance. Deep learning models, however, have the potential of consistent spatial evaluation of TILs and other immune cell subsets with the potential of improved prognostic and predictive value. To make the development of these models possible, we created the Panoptic Segmentation of nUclei and tissue in advanced MelanomA (PUMA) dataset and assessed the performance of several state-of-the-art deep learning models. In addition, we show how to improve model performance further by using heuristic postprocessing in which nuclei classes are updated based on their tissue localization.

Results: The PUMA dataset includes 155 primary and 155 metastatic melanoma hematoxylin and eosin-stained regions of interest with nuclei and tissue annotations from a single melanoma referral institution. The Hover-NeXt model, trained on the PUMA dataset, demonstrated the best performance for lymphocyte detection, approaching human interobserver agreement. In addition, heuristic postprocessing of deep learning models improved the detection of noncommon classes, such as epithelial nuclei.

Conclusion: The PUMA dataset is the first melanoma-specific dataset that can be used to develop melanoma-specific nuclei and tissue segmentation models. These models can, in turn, be used for prognostic and predictive biomarker development. Incorporating tissue and nuclei segmentation is a step toward improved deep learning nuclei segmentation performance. To support the development of these models, this dataset is used in the PUMA challenge.

查看原文本刊更多论文

基于基线核分割和组织分割基准的黑色素瘤核和组织分割的新数据集。

背景：黑色素瘤是一种侵袭性皮肤癌，肿瘤浸润淋巴细胞（til）是复发和治疗反应的生物标志物。手动TIL评估容易出现观察者之间的可变性，并且当前的深度学习模型不能公开访问或性能较低。然而，深度学习模型具有对TILs和其他免疫细胞亚群进行一致空间评估的潜力，具有改善预后和预测价值的潜力。为了使这些模型的开发成为可能，我们创建了晚期黑色素瘤（PUMA）数据集的细胞核和组织的全视分割，并评估了几个最先进的深度学习模型的性能。此外，我们展示了如何通过启发式后处理进一步提高模型性能，其中核类根据其组织定位进行更新。结果：PUMA数据集包括来自单一黑色素瘤转诊机构的155个原发性和155个转移性黑色素瘤苏木精和伊红染色感兴趣的细胞核和组织注释区域。在PUMA数据集上训练的Hover-NeXt模型显示出淋巴细胞检测的最佳性能，接近人类观察者之间的一致性。此外，深度学习模型的启发式后处理改进了对非常见类别（如上皮细胞核）的检测。结论：PUMA数据集是第一个可用于开发黑色素瘤特异性细胞核和组织分割模型的黑色素瘤特异性数据集。这些模型可以反过来用于预测和预测生物标志物的开发。结合组织和核分割是提高深度学习核分割性能的重要一步。为了支持这些模型的开发，该数据集被用于PUMA挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

GigaScience MULTIDISCIPLINARY SCIENCES-

CiteScore

15.50

自引率

1.10%

发文量

119

审稿时长

1 weeks

期刊介绍： GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.