Yosep Chong, Daseul Park, Youngbin Ahn, Yoonjin Kwak, Seyeon Park, Seung Wan Back, Changwoo Lee, Gyeongsin Park, Mohammad Rizwan Alam, Binna Kim, Kee-Taek Jang, Nayoung Han, Chong Woo Yoo, Jonghyuck Lee, Cheol Lee, Young-Gon Kim
{"title":"用于病灶分割的大规模皮肤病理学数据集:模型开发和分析。","authors":"Yosep Chong, Daseul Park, Youngbin Ahn, Yoonjin Kwak, Seyeon Park, Seung Wan Back, Changwoo Lee, Gyeongsin Park, Mohammad Rizwan Alam, Binna Kim, Kee-Taek Jang, Nayoung Han, Chong Woo Yoo, Jonghyuck Lee, Cheol Lee, Young-Gon Kim","doi":"10.3346/jkms.2025.40.e220","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>With the increasing incidence of skin cancer, the workload for pathologists has surged. The diagnosis of skin samples, especially for complex lesions such as malignant melanomas and melanocytic lesions, has shown higher diagnostic variability compared to other organ samples. Consequently, artificial intelligence (AI)-based diagnostic assistance programs are increasingly needed to support dermatopathologists in achieving more consistent diagnoses. However, large-scale skin pathology image datasets for AI learning are often insufficient or limited to specific diseases. This study aimed to build and assess a large-scale dermatopathology image dataset for an AI model.</p><p><strong>Methods: </strong>We trained and evaluated a lesion segmentation model based on this dataset, which consisted of over 34,376 histopathology slide images collected from four institutions, including normal skin and six types of common skin lesion: epidermal cysts, seborrheic keratosis, Bowen disease/squamous cell carcinoma, basal cell carcinoma, melanocytic nevus, and malignant melanoma. Each image was accompanied by labeled data consisting of lesion area annotations and clinical information. To ensure the high quality and accuracy of the dataset, we employed data quality management methods, including syntactic accuracy, semantic accuracy, statistical diversity, and validity evaluation.</p><p><strong>Results: </strong>The results of the dataset quality assessment confirmed high quality, with syntactic accuracy and semantic accuracy at 0.99 and 0.95, respectively. Statistical diversity was verified to follow a natural distribution. The validity evaluation verified the strong performance of the segmentation model for each group of data, with a Dice score ranging from 80% to 91%.</p><p><strong>Conclusion: </strong>The results demonstrated that our constructed dataset provides a well-suited resource for deep learning training, offering a large-scale multi-institutional dermatopathology dataset that can drive advancements in AI-driven dermatopathology diagnosis.</p>","PeriodicalId":16249,"journal":{"name":"Journal of Korean Medical Science","volume":"40 35","pages":"e220"},"PeriodicalIF":2.3000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12418205/pdf/","citationCount":"0","resultStr":"{\"title\":\"Large-Scale Dermatopathology Dataset for Lesion Segmentation: Model Development and Analysis.\",\"authors\":\"Yosep Chong, Daseul Park, Youngbin Ahn, Yoonjin Kwak, Seyeon Park, Seung Wan Back, Changwoo Lee, Gyeongsin Park, Mohammad Rizwan Alam, Binna Kim, Kee-Taek Jang, Nayoung Han, Chong Woo Yoo, Jonghyuck Lee, Cheol Lee, Young-Gon Kim\",\"doi\":\"10.3346/jkms.2025.40.e220\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>With the increasing incidence of skin cancer, the workload for pathologists has surged. The diagnosis of skin samples, especially for complex lesions such as malignant melanomas and melanocytic lesions, has shown higher diagnostic variability compared to other organ samples. Consequently, artificial intelligence (AI)-based diagnostic assistance programs are increasingly needed to support dermatopathologists in achieving more consistent diagnoses. However, large-scale skin pathology image datasets for AI learning are often insufficient or limited to specific diseases. This study aimed to build and assess a large-scale dermatopathology image dataset for an AI model.</p><p><strong>Methods: </strong>We trained and evaluated a lesion segmentation model based on this dataset, which consisted of over 34,376 histopathology slide images collected from four institutions, including normal skin and six types of common skin lesion: epidermal cysts, seborrheic keratosis, Bowen disease/squamous cell carcinoma, basal cell carcinoma, melanocytic nevus, and malignant melanoma. Each image was accompanied by labeled data consisting of lesion area annotations and clinical information. To ensure the high quality and accuracy of the dataset, we employed data quality management methods, including syntactic accuracy, semantic accuracy, statistical diversity, and validity evaluation.</p><p><strong>Results: </strong>The results of the dataset quality assessment confirmed high quality, with syntactic accuracy and semantic accuracy at 0.99 and 0.95, respectively. Statistical diversity was verified to follow a natural distribution. The validity evaluation verified the strong performance of the segmentation model for each group of data, with a Dice score ranging from 80% to 91%.</p><p><strong>Conclusion: </strong>The results demonstrated that our constructed dataset provides a well-suited resource for deep learning training, offering a large-scale multi-institutional dermatopathology dataset that can drive advancements in AI-driven dermatopathology diagnosis.</p>\",\"PeriodicalId\":16249,\"journal\":{\"name\":\"Journal of Korean Medical Science\",\"volume\":\"40 35\",\"pages\":\"e220\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12418205/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Korean Medical Science\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3346/jkms.2025.40.e220\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Korean Medical Science","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3346/jkms.2025.40.e220","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
Large-Scale Dermatopathology Dataset for Lesion Segmentation: Model Development and Analysis.
Background: With the increasing incidence of skin cancer, the workload for pathologists has surged. The diagnosis of skin samples, especially for complex lesions such as malignant melanomas and melanocytic lesions, has shown higher diagnostic variability compared to other organ samples. Consequently, artificial intelligence (AI)-based diagnostic assistance programs are increasingly needed to support dermatopathologists in achieving more consistent diagnoses. However, large-scale skin pathology image datasets for AI learning are often insufficient or limited to specific diseases. This study aimed to build and assess a large-scale dermatopathology image dataset for an AI model.
Methods: We trained and evaluated a lesion segmentation model based on this dataset, which consisted of over 34,376 histopathology slide images collected from four institutions, including normal skin and six types of common skin lesion: epidermal cysts, seborrheic keratosis, Bowen disease/squamous cell carcinoma, basal cell carcinoma, melanocytic nevus, and malignant melanoma. Each image was accompanied by labeled data consisting of lesion area annotations and clinical information. To ensure the high quality and accuracy of the dataset, we employed data quality management methods, including syntactic accuracy, semantic accuracy, statistical diversity, and validity evaluation.
Results: The results of the dataset quality assessment confirmed high quality, with syntactic accuracy and semantic accuracy at 0.99 and 0.95, respectively. Statistical diversity was verified to follow a natural distribution. The validity evaluation verified the strong performance of the segmentation model for each group of data, with a Dice score ranging from 80% to 91%.
Conclusion: The results demonstrated that our constructed dataset provides a well-suited resource for deep learning training, offering a large-scale multi-institutional dermatopathology dataset that can drive advancements in AI-driven dermatopathology diagnosis.
期刊介绍:
The Journal of Korean Medical Science (JKMS) is an international, peer-reviewed Open Access journal of medicine published weekly in English. The Journal’s publisher is the Korean Academy of Medical Sciences (KAMS), Korean Medical Association (KMA). JKMS aims to publish evidence-based, scientific research articles from various disciplines of the medical sciences. The Journal welcomes articles of general interest to medical researchers especially when they contain original information. Articles on the clinical evaluation of drugs and other therapies, epidemiologic studies of the general population, studies on pathogenic organisms and toxic materials, and the toxicities and adverse effects of therapeutics are welcome.