ColoPola: A polarimetric imaging dataset for colorectal cancer detection.

IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES
Thi-Thu-Hien Pham, Quoc-Hoang-Quyen Vo, Thao-Vi Nguyen, The-Hiep Nguyen, Quoc-Hung Phan, Thanh-Hai Le
{"title":"ColoPola: A polarimetric imaging dataset for colorectal cancer detection.","authors":"Thi-Thu-Hien Pham, Quoc-Hoang-Quyen Vo, Thao-Vi Nguyen, The-Hiep Nguyen, Quoc-Hung Phan, Thanh-Hai Le","doi":"10.1093/gigascience/giaf120","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>In recent years, polarimetric imaging has been developed for various biological applications, including tissue morphological characterization and cancer stage detection. However, to facilitate classification models based on the characteristics of polarization states, it is essential to develop a consistent and standardized dataset of polarimetric images.</p><p><strong>Findings: </strong>This study presents a dataset of colorectal cancer polarimetric images designated as ColoPola, which is intended to facilitate research efforts in the field. The dataset consists of 572 sample slices (288 healthy and 284 malignant). For each slice, 36 polarimetric images corresponding to different polarization states are provided. Thus, ColoPola contains 20,592 polarimetric images, of which 10,368 correspond to healthy samples and 10,224 to malignant samples. To the best of the authors' knowledge, the dataset is the first of its kind for colorectal cancer images. The practical utility of the dataset is evaluated using 5 models: 3 models constructed from scratch (CNN, CNN_2, and EfficientFormerV2) and 2 pretrained models (DenseNet and EfficientNetV2). For each model, the input has a size of 224 × 224 × 36, corresponding to the width, height, and red channel value of the polarimetric images, respectively.</p><p><strong>Conclusions: </strong>The results show that the CNN, CNN_2, EfficientFormerV2, DenseNet, and EfficientNetV2 models obtain F1 scores of 0.870, 0.862, 0.908, 0.903, and 0.965, respectively, on the testing set. Among the 5 models, EfficientNetV2 achieves the best performance, with all the performance metrics exceeding 0.95 for both the validation set and the testing set. Overall, the results suggest that ColoPola has significant potential as a polarimetric optical imaging-based diagnostic tool for colorectal cancer in clinical practice.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12530094/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giaf120","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: In recent years, polarimetric imaging has been developed for various biological applications, including tissue morphological characterization and cancer stage detection. However, to facilitate classification models based on the characteristics of polarization states, it is essential to develop a consistent and standardized dataset of polarimetric images.

Findings: This study presents a dataset of colorectal cancer polarimetric images designated as ColoPola, which is intended to facilitate research efforts in the field. The dataset consists of 572 sample slices (288 healthy and 284 malignant). For each slice, 36 polarimetric images corresponding to different polarization states are provided. Thus, ColoPola contains 20,592 polarimetric images, of which 10,368 correspond to healthy samples and 10,224 to malignant samples. To the best of the authors' knowledge, the dataset is the first of its kind for colorectal cancer images. The practical utility of the dataset is evaluated using 5 models: 3 models constructed from scratch (CNN, CNN_2, and EfficientFormerV2) and 2 pretrained models (DenseNet and EfficientNetV2). For each model, the input has a size of 224 × 224 × 36, corresponding to the width, height, and red channel value of the polarimetric images, respectively.

Conclusions: The results show that the CNN, CNN_2, EfficientFormerV2, DenseNet, and EfficientNetV2 models obtain F1 scores of 0.870, 0.862, 0.908, 0.903, and 0.965, respectively, on the testing set. Among the 5 models, EfficientNetV2 achieves the best performance, with all the performance metrics exceeding 0.95 for both the validation set and the testing set. Overall, the results suggest that ColoPola has significant potential as a polarimetric optical imaging-based diagnostic tool for colorectal cancer in clinical practice.

ColoPola:用于结直肠癌检测的偏振成像数据集。
背景:近年来,偏振成像已经发展到各种生物学应用,包括组织形态表征和癌症分期检测。然而,为了便于基于偏振态特征的分类模型,必须建立一个一致的、标准化的偏振图像数据集。研究结果:本研究提出了一个命名为ColoPola的结直肠癌偏振图像数据集,旨在促进该领域的研究工作。该数据集由572个样本切片组成(288个健康切片和284个恶性切片)。对于每个切片,提供36张对应于不同偏振状态的偏振图像。因此,ColoPola包含20,592张偏振图像,其中10,368张对应于健康样本,10,224张对应于恶性样本。据作者所知,该数据集是第一个用于结直肠癌图像的数据集。使用5个模型评估数据集的实际效用:3个从头构建的模型(CNN, CNN_2和EfficientFormerV2)和2个预训练模型(DenseNet和EfficientNetV2)。每个模型的输入尺寸为224 × 224 × 36,分别对应偏振图像的宽度、高度和红色通道值。结论:结果表明,CNN、CNN_2、EfficientFormerV2、DenseNet和EfficientNetV2模型在测试集上的F1得分分别为0.870、0.862、0.908、0.903和0.965。在5个模型中,EfficientNetV2实现了最好的性能,验证集和测试集的所有性能指标都超过了0.95。总的来说,结果表明ColoPola在临床实践中具有作为基于偏振光学成像的结直肠癌诊断工具的巨大潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
GigaScience
GigaScience MULTIDISCIPLINARY SCIENCES-
CiteScore
15.50
自引率
1.10%
发文量
119
审稿时长
1 weeks
期刊介绍: GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信