{"title":"ColoPola: A polarimetric imaging dataset for colorectal cancer detection.","authors":"Thi-Thu-Hien Pham, Quoc-Hoang-Quyen Vo, Thao-Vi Nguyen, The-Hiep Nguyen, Quoc-Hung Phan, Thanh-Hai Le","doi":"10.1093/gigascience/giaf120","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>In recent years, polarimetric imaging has been developed for various biological applications, including tissue morphological characterization and cancer stage detection. However, to facilitate classification models based on the characteristics of polarization states, it is essential to develop a consistent and standardized dataset of polarimetric images.</p><p><strong>Findings: </strong>This study presents a dataset of colorectal cancer polarimetric images designated as ColoPola, which is intended to facilitate research efforts in the field. The dataset consists of 572 sample slices (288 healthy and 284 malignant). For each slice, 36 polarimetric images corresponding to different polarization states are provided. Thus, ColoPola contains 20,592 polarimetric images, of which 10,368 correspond to healthy samples and 10,224 to malignant samples. To the best of the authors' knowledge, the dataset is the first of its kind for colorectal cancer images. The practical utility of the dataset is evaluated using 5 models: 3 models constructed from scratch (CNN, CNN_2, and EfficientFormerV2) and 2 pretrained models (DenseNet and EfficientNetV2). For each model, the input has a size of 224 × 224 × 36, corresponding to the width, height, and red channel value of the polarimetric images, respectively.</p><p><strong>Conclusions: </strong>The results show that the CNN, CNN_2, EfficientFormerV2, DenseNet, and EfficientNetV2 models obtain F1 scores of 0.870, 0.862, 0.908, 0.903, and 0.965, respectively, on the testing set. Among the 5 models, EfficientNetV2 achieves the best performance, with all the performance metrics exceeding 0.95 for both the validation set and the testing set. Overall, the results suggest that ColoPola has significant potential as a polarimetric optical imaging-based diagnostic tool for colorectal cancer in clinical practice.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12530094/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giaf120","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: In recent years, polarimetric imaging has been developed for various biological applications, including tissue morphological characterization and cancer stage detection. However, to facilitate classification models based on the characteristics of polarization states, it is essential to develop a consistent and standardized dataset of polarimetric images.
Findings: This study presents a dataset of colorectal cancer polarimetric images designated as ColoPola, which is intended to facilitate research efforts in the field. The dataset consists of 572 sample slices (288 healthy and 284 malignant). For each slice, 36 polarimetric images corresponding to different polarization states are provided. Thus, ColoPola contains 20,592 polarimetric images, of which 10,368 correspond to healthy samples and 10,224 to malignant samples. To the best of the authors' knowledge, the dataset is the first of its kind for colorectal cancer images. The practical utility of the dataset is evaluated using 5 models: 3 models constructed from scratch (CNN, CNN_2, and EfficientFormerV2) and 2 pretrained models (DenseNet and EfficientNetV2). For each model, the input has a size of 224 × 224 × 36, corresponding to the width, height, and red channel value of the polarimetric images, respectively.
Conclusions: The results show that the CNN, CNN_2, EfficientFormerV2, DenseNet, and EfficientNetV2 models obtain F1 scores of 0.870, 0.862, 0.908, 0.903, and 0.965, respectively, on the testing set. Among the 5 models, EfficientNetV2 achieves the best performance, with all the performance metrics exceeding 0.95 for both the validation set and the testing set. Overall, the results suggest that ColoPola has significant potential as a polarimetric optical imaging-based diagnostic tool for colorectal cancer in clinical practice.
期刊介绍:
GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.