Sai Chandana Koganti;Siri Yellu;Jihoon Yun;Sanghoon Lee
{"title":"Task-Ready PanNuke and NuCLS Datasets: Reorganization, Synthetic Data Generation, and Experimental Evaluation","authors":"Sai Chandana Koganti;Siri Yellu;Jihoon Yun;Sanghoon Lee","doi":"10.1109/ACCESS.2025.3589477","DOIUrl":null,"url":null,"abstract":"Automating nuclei analysis in histopathology is essential for enhancing disease diagnosis; however, training reliable models necessitates well-structured datasets. This paper addresses the gap in standardized data preparation workflows for two critical histopathology datasets: PanNuke and NuCLS. First, we organize histopathology images and masks into training-validation splits, extract subsets specific to cell types, and generate multi-scale patches to enable robust model training across various resolutions using the PanNuke dataset. Second, we curate task-specific subsets for object detection and semantic segmentation, ensuring consistency across splits while addressing annotation inconsistencies with the NuCLS dataset. Third, we conduct experiments on two reorganized datasets, including cell-type-specific binary classification, multi-task evaluation, and extension to synthetic datasets. Our workflows address common histopathology data challenges, including fragmented annotations, class imbalance, and mismatched metadata. The processed datasets are shared in standardized formats, allowing researchers to train models directly for critical tasks such as detecting cancerous nuclei or segmenting inflammatory cells in histopathology images.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"125275-125286"},"PeriodicalIF":3.4000,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11080424","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11080424/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Automating nuclei analysis in histopathology is essential for enhancing disease diagnosis; however, training reliable models necessitates well-structured datasets. This paper addresses the gap in standardized data preparation workflows for two critical histopathology datasets: PanNuke and NuCLS. First, we organize histopathology images and masks into training-validation splits, extract subsets specific to cell types, and generate multi-scale patches to enable robust model training across various resolutions using the PanNuke dataset. Second, we curate task-specific subsets for object detection and semantic segmentation, ensuring consistency across splits while addressing annotation inconsistencies with the NuCLS dataset. Third, we conduct experiments on two reorganized datasets, including cell-type-specific binary classification, multi-task evaluation, and extension to synthetic datasets. Our workflows address common histopathology data challenges, including fragmented annotations, class imbalance, and mismatched metadata. The processed datasets are shared in standardized formats, allowing researchers to train models directly for critical tasks such as detecting cancerous nuclei or segmenting inflammatory cells in histopathology images.
IEEE AccessCOMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC
CiteScore
9.80
自引率
7.70%
发文量
6673
审稿时长
6 weeks
期刊介绍:
IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest.
IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on:
Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals.
Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering.
Development of new or improved fabrication or manufacturing techniques.
Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.