Can Shi, Jinghong Fan, Zhonghan Deng, Huanlin Liu, Qiang Kang, Yumei Li, Jing Guo, Jingwen Wang, Jinjiang Gong, Sha Liao, Ao Chen, Ying Zhang, Mei Li
{"title":"CellBinDB:用于通用模型基准测试的大规模多模态注释数据集。","authors":"Can Shi, Jinghong Fan, Zhonghan Deng, Huanlin Liu, Qiang Kang, Yumei Li, Jing Guo, Jingwen Wang, Jinjiang Gong, Sha Liao, Ao Chen, Ying Zhang, Mei Li","doi":"10.1093/gigascience/giaf069","DOIUrl":null,"url":null,"abstract":"<p><p>In recent years, cell segmentation techniques have played a critical role in the analysis of biological images, especially for quantitative studies. Deep learning-based cell segmentation models have demonstrated remarkable performance in segmenting cell and nucleus boundaries, but they are typically tailored to specific modalities or require manual tuning of hyperparameters, limiting their generalizability to unseen data. Comprehensive datasets that support both the training of universal models and the evaluation of various segmentation techniques are essential for overcoming these limitations and promoting the development of more versatile cell segmentation solutions. Here, we present CellBinDB, a large-scale multimodal annotated dataset established for these purposes. CellBinDB contains more than 1,000 annotated images, each labeled to identify the boundaries of cells or nuclei, including 4',6-diamidino-2-phenylindole, single-stranded DNA, hematoxylin and eosin, and multiplex immunofluorescence staining, covering over 30 normal and diseased tissue types from human and mouse samples. Based on CellBinDB, we benchmarked 8 state-of-the-art and widely used cell segmentation technologies/methods, and our further analysis reveals that complex cell shapes reduce segmentation accuracy while higher image gradients improve boundary detection, offering insights for refining segmentation strategies across diverse imaging scenarios.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12206155/pdf/","citationCount":"0","resultStr":"{\"title\":\"CellBinDB: a large-scale multimodal annotated dataset for cell segmentation with benchmarking of universal models.\",\"authors\":\"Can Shi, Jinghong Fan, Zhonghan Deng, Huanlin Liu, Qiang Kang, Yumei Li, Jing Guo, Jingwen Wang, Jinjiang Gong, Sha Liao, Ao Chen, Ying Zhang, Mei Li\",\"doi\":\"10.1093/gigascience/giaf069\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>In recent years, cell segmentation techniques have played a critical role in the analysis of biological images, especially for quantitative studies. Deep learning-based cell segmentation models have demonstrated remarkable performance in segmenting cell and nucleus boundaries, but they are typically tailored to specific modalities or require manual tuning of hyperparameters, limiting their generalizability to unseen data. Comprehensive datasets that support both the training of universal models and the evaluation of various segmentation techniques are essential for overcoming these limitations and promoting the development of more versatile cell segmentation solutions. Here, we present CellBinDB, a large-scale multimodal annotated dataset established for these purposes. CellBinDB contains more than 1,000 annotated images, each labeled to identify the boundaries of cells or nuclei, including 4',6-diamidino-2-phenylindole, single-stranded DNA, hematoxylin and eosin, and multiplex immunofluorescence staining, covering over 30 normal and diseased tissue types from human and mouse samples. Based on CellBinDB, we benchmarked 8 state-of-the-art and widely used cell segmentation technologies/methods, and our further analysis reveals that complex cell shapes reduce segmentation accuracy while higher image gradients improve boundary detection, offering insights for refining segmentation strategies across diverse imaging scenarios.</p>\",\"PeriodicalId\":12581,\"journal\":{\"name\":\"GigaScience\",\"volume\":\"14 \",\"pages\":\"\"},\"PeriodicalIF\":11.8000,\"publicationDate\":\"2025-01-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12206155/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"GigaScience\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/gigascience/giaf069\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giaf069","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
CellBinDB: a large-scale multimodal annotated dataset for cell segmentation with benchmarking of universal models.
In recent years, cell segmentation techniques have played a critical role in the analysis of biological images, especially for quantitative studies. Deep learning-based cell segmentation models have demonstrated remarkable performance in segmenting cell and nucleus boundaries, but they are typically tailored to specific modalities or require manual tuning of hyperparameters, limiting their generalizability to unseen data. Comprehensive datasets that support both the training of universal models and the evaluation of various segmentation techniques are essential for overcoming these limitations and promoting the development of more versatile cell segmentation solutions. Here, we present CellBinDB, a large-scale multimodal annotated dataset established for these purposes. CellBinDB contains more than 1,000 annotated images, each labeled to identify the boundaries of cells or nuclei, including 4',6-diamidino-2-phenylindole, single-stranded DNA, hematoxylin and eosin, and multiplex immunofluorescence staining, covering over 30 normal and diseased tissue types from human and mouse samples. Based on CellBinDB, we benchmarked 8 state-of-the-art and widely used cell segmentation technologies/methods, and our further analysis reveals that complex cell shapes reduce segmentation accuracy while higher image gradients improve boundary detection, offering insights for refining segmentation strategies across diverse imaging scenarios.
期刊介绍:
GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.