Leonel Hidalgo, María Paz Salinas, Javiera Sepúlveda, Karina Carrasco, Pamela Romero, Alma Pedro, Soledad Vidaurre, Domingo Mery, Cristian Navarrete-Dechent
{"title":"为人工智能创建皮肤病数据库,智利的经验和ChatGPT的建议","authors":"Leonel Hidalgo, María Paz Salinas, Javiera Sepúlveda, Karina Carrasco, Pamela Romero, Alma Pedro, Soledad Vidaurre, Domingo Mery, Cristian Navarrete-Dechent","doi":"10.1002/jvc2.546","DOIUrl":null,"url":null,"abstract":"<p>Since artificial intelligence (AI) has widely shown applications for skin cancer diagnosis, creating comprehensive image datasets is key.<span><sup>1-4</sup></span> Availability of databases are increasing, with a low representation of higher phototypes, certain ethnic groups, and limited metadata.<span><sup>5</sup></span> Excluding specific populations perpetuates healthcare disparities in the AI era.<span><sup>6</sup></span> Due to the lack of diverse datasets, external use and validation of AI algorithms is not currently possible in our population. We started a project to create a Chilean AI database: The ‘Trawa’ database ('skin' in Mapuzungun, a native Chilean language). This study aims to describe our current dataset characteristics along with the limitations during its creation.</p><p>This was a retrospective study approved by the local Institutional Review Board (IRB). The images were collected from January 2019 to December 2020, from four dermatologists working in a Tertiary Care Academic Hospital. Clinical and dermoscopy images were obtained with variable smartphones. All included lesions are biopsy-proven. Metadata (i.e., age, sex, anatomical location, histopathological details, relevant past medical story, and phototype) was obtained from the electronic medical records. Cases were coded in a specific folder. All data was stored in a Health Insurance Portability and Accountability Act (HIPAA)-compliant web hosting.</p><p>During the study period, we collected 860 individual cases consisting of 4435 clinical and dermoscopy images (Figure 1), organized in seven categories: actinic keratosis, basal cell carcinoma, cutaneous squamous cell carcinoma, melanoma, naevus, seborrhoeic keratosis and others (angiomas, warts, etc.) (Table 1), regarding metadata 52.6% were women; the average age was 54 years; 32.8% had photodamage and 70.2% were phototype III. Most cases were located on the head and neck (50.6%); and 26.8% of the diagnosis were malignant.</p><p>Finally, we also suggest working with multidisciplinary teams composed of dermatologists and computer science professionals. Creating and improving databases will augment the performance of AI algorithms,<span><sup>9</sup></span> and for us, this is a necessary step for performing collaborative work with other countries in the region (e.g., Latin America).<span><sup>3</sup></span> Potential applications of the current database include algorithm training fine-tuned for local data as well as comparing different algorithms performance on different and diverse databases. The main limitations of our database is its relatively small size. Organising lesions requires a large team and multiple resources. Also, we have included only lesions with histopathology confirmation, biasing the database towards more 'suspicious' lesions. Using noninvasive imaging technologies such as reflectance confocal microscopy could be an alternative to include nonbiopsied benign lesions.<span><sup>10</sup></span></p><p><i>Acquisition, analysis, and interpretation of data</i>: Leonel Hidalgo, María Paz Salinas, Javiera Sepúlveda, Karina Carrasco, Pamela Romero, Alma Pedro, Soledad Vidaurre, Domingo Mery and Cristian Navarrete-Dechent. D<i>rafting and revising the article</i>: Leonel Hidalgo, María Paz Salinas, Javiera Sepúlveda, Karina Carrasco, Pamela Romero, Alma Pedro, Soledad Vidaurre, Domingo Mery and Cristian Navarrete-Dechent. <i>Final approval of the version to be published</i>: Leonel Hidalgo, María Paz Salinas, Javiera Sepúlveda, Karina Carrasco, Pamela Romero, Alma Pedro, Soledad Vidaurre, Domingo Mery and Cristian Navarrete-Dechent.</p><p>This work was funded in part by ANID—Millennium Science Initiative Programme ICN2021_004.</p><p>The authors declare no conflict of interest.</p><p>Reviewed and approved by Scientific Ethical Committee for Health Sciences of Pontificia Universidad Católica de Chile; approval #211213001.</p>","PeriodicalId":94325,"journal":{"name":"JEADV clinical practice","volume":"4 1","pages":"296-298"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jvc2.546","citationCount":"0","resultStr":"{\"title\":\"Creating a dermatologic database for artificial intelligence, a Chilean experience, and advice from ChatGPT\",\"authors\":\"Leonel Hidalgo, María Paz Salinas, Javiera Sepúlveda, Karina Carrasco, Pamela Romero, Alma Pedro, Soledad Vidaurre, Domingo Mery, Cristian Navarrete-Dechent\",\"doi\":\"10.1002/jvc2.546\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Since artificial intelligence (AI) has widely shown applications for skin cancer diagnosis, creating comprehensive image datasets is key.<span><sup>1-4</sup></span> Availability of databases are increasing, with a low representation of higher phototypes, certain ethnic groups, and limited metadata.<span><sup>5</sup></span> Excluding specific populations perpetuates healthcare disparities in the AI era.<span><sup>6</sup></span> Due to the lack of diverse datasets, external use and validation of AI algorithms is not currently possible in our population. We started a project to create a Chilean AI database: The ‘Trawa’ database ('skin' in Mapuzungun, a native Chilean language). This study aims to describe our current dataset characteristics along with the limitations during its creation.</p><p>This was a retrospective study approved by the local Institutional Review Board (IRB). The images were collected from January 2019 to December 2020, from four dermatologists working in a Tertiary Care Academic Hospital. Clinical and dermoscopy images were obtained with variable smartphones. All included lesions are biopsy-proven. Metadata (i.e., age, sex, anatomical location, histopathological details, relevant past medical story, and phototype) was obtained from the electronic medical records. Cases were coded in a specific folder. All data was stored in a Health Insurance Portability and Accountability Act (HIPAA)-compliant web hosting.</p><p>During the study period, we collected 860 individual cases consisting of 4435 clinical and dermoscopy images (Figure 1), organized in seven categories: actinic keratosis, basal cell carcinoma, cutaneous squamous cell carcinoma, melanoma, naevus, seborrhoeic keratosis and others (angiomas, warts, etc.) (Table 1), regarding metadata 52.6% were women; the average age was 54 years; 32.8% had photodamage and 70.2% were phototype III. Most cases were located on the head and neck (50.6%); and 26.8% of the diagnosis were malignant.</p><p>Finally, we also suggest working with multidisciplinary teams composed of dermatologists and computer science professionals. Creating and improving databases will augment the performance of AI algorithms,<span><sup>9</sup></span> and for us, this is a necessary step for performing collaborative work with other countries in the region (e.g., Latin America).<span><sup>3</sup></span> Potential applications of the current database include algorithm training fine-tuned for local data as well as comparing different algorithms performance on different and diverse databases. The main limitations of our database is its relatively small size. Organising lesions requires a large team and multiple resources. Also, we have included only lesions with histopathology confirmation, biasing the database towards more 'suspicious' lesions. Using noninvasive imaging technologies such as reflectance confocal microscopy could be an alternative to include nonbiopsied benign lesions.<span><sup>10</sup></span></p><p><i>Acquisition, analysis, and interpretation of data</i>: Leonel Hidalgo, María Paz Salinas, Javiera Sepúlveda, Karina Carrasco, Pamela Romero, Alma Pedro, Soledad Vidaurre, Domingo Mery and Cristian Navarrete-Dechent. D<i>rafting and revising the article</i>: Leonel Hidalgo, María Paz Salinas, Javiera Sepúlveda, Karina Carrasco, Pamela Romero, Alma Pedro, Soledad Vidaurre, Domingo Mery and Cristian Navarrete-Dechent. <i>Final approval of the version to be published</i>: Leonel Hidalgo, María Paz Salinas, Javiera Sepúlveda, Karina Carrasco, Pamela Romero, Alma Pedro, Soledad Vidaurre, Domingo Mery and Cristian Navarrete-Dechent.</p><p>This work was funded in part by ANID—Millennium Science Initiative Programme ICN2021_004.</p><p>The authors declare no conflict of interest.</p><p>Reviewed and approved by Scientific Ethical Committee for Health Sciences of Pontificia Universidad Católica de Chile; approval #211213001.</p>\",\"PeriodicalId\":94325,\"journal\":{\"name\":\"JEADV clinical practice\",\"volume\":\"4 1\",\"pages\":\"296-298\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jvc2.546\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JEADV clinical practice\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/jvc2.546\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JEADV clinical practice","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jvc2.546","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
由于人工智能(AI)已广泛应用于皮肤癌诊断,因此创建全面的图像数据集是关键。数据库的可用性正在增加,但较高的照相类型、某些种族和有限的元数据的代表性较低将特定人群排除在外,使人工智能时代的医疗保健差距持续存在由于缺乏多样化的数据集,人工智能算法的外部使用和验证目前在我们的人群中是不可能的。我们开始了一个创建智利AI数据库的项目:“Trawa”数据库(在智利本土语言Mapuzungun中的“皮肤”)。本研究旨在描述我们当前数据集的特征以及其创建过程中的局限性。这是一项由当地机构审查委员会(IRB)批准的回顾性研究。这些图像是在2019年1月至2020年12月期间从一家三级医疗学术医院的四名皮肤科医生那里收集的。使用不同的智能手机获得临床和皮肤镜图像。所有病变均经活检证实。从电子病历中获得元数据(即年龄、性别、解剖位置、组织病理学细节、相关既往病史和照片类型)。病例被编码在一个特定的文件夹中。所有数据都存储在符合健康保险可携带性和责任法案(HIPAA)的网络托管中。在研究期间,我们收集了860例病例,包括4435张临床和皮肤镜图像(图1),分为7类:光化性角化病、基底细胞癌、皮肤鳞状细胞癌、黑色素瘤、痣、脂腺性角化病和其他(血管瘤、疣等)(表1),元数据中52.6%为女性;平均年龄54岁;光损伤占32.8%,光型占70.2%。大多数病例位于头颈部(50.6%);恶性肿瘤占26.8%。最后,我们还建议与由皮肤科医生和计算机科学专业人员组成的多学科团队合作。创建和改进数据库将增强人工智能算法的性能,9对我们来说,这是与该地区其他国家(例如拉丁美洲)进行协作工作的必要步骤当前数据库的潜在应用包括对本地数据进行微调的算法训练,以及比较不同数据库上不同算法的性能。我们数据库的主要限制是相对较小的大小。组织病变需要一个庞大的团队和多种资源。此外,我们只纳入了组织病理学证实的病变,使数据库偏向于更“可疑”的病变。使用非侵入性成像技术,如反射共聚焦显微镜,可能是包括非活检良性病变的另一种选择。数据采集、分析和解释:Leonel Hidalgo, María Paz Salinas, javier Sepúlveda, Karina Carrasco, Pamela Romero, Alma Pedro, Soledad Vidaurre, Domingo Mery和Cristian Navarrete-Dechent。起草和修改文章:Leonel Hidalgo, María Paz Salinas, javier Sepúlveda, Karina Carrasco, Pamela Romero, Alma Pedro, Soledad Vidaurre, Domingo Mery和Cristian Navarrete-Dechent。最终批准出版的版本:Leonel Hidalgo、María Paz Salinas、javier Sepúlveda、Karina Carrasco、Pamela Romero、Alma Pedro、Soledad Vidaurre、Domingo Mery和Cristian Navarrete-Dechent。这项工作得到了美国国家开发署千年科学行动计划ICN2021_004的部分资助。作者声明无利益冲突。经智利教廷大学健康科学伦理委员会Católica审查和批准;批准# 211213001。
Creating a dermatologic database for artificial intelligence, a Chilean experience, and advice from ChatGPT
Since artificial intelligence (AI) has widely shown applications for skin cancer diagnosis, creating comprehensive image datasets is key.1-4 Availability of databases are increasing, with a low representation of higher phototypes, certain ethnic groups, and limited metadata.5 Excluding specific populations perpetuates healthcare disparities in the AI era.6 Due to the lack of diverse datasets, external use and validation of AI algorithms is not currently possible in our population. We started a project to create a Chilean AI database: The ‘Trawa’ database ('skin' in Mapuzungun, a native Chilean language). This study aims to describe our current dataset characteristics along with the limitations during its creation.
This was a retrospective study approved by the local Institutional Review Board (IRB). The images were collected from January 2019 to December 2020, from four dermatologists working in a Tertiary Care Academic Hospital. Clinical and dermoscopy images were obtained with variable smartphones. All included lesions are biopsy-proven. Metadata (i.e., age, sex, anatomical location, histopathological details, relevant past medical story, and phototype) was obtained from the electronic medical records. Cases were coded in a specific folder. All data was stored in a Health Insurance Portability and Accountability Act (HIPAA)-compliant web hosting.
During the study period, we collected 860 individual cases consisting of 4435 clinical and dermoscopy images (Figure 1), organized in seven categories: actinic keratosis, basal cell carcinoma, cutaneous squamous cell carcinoma, melanoma, naevus, seborrhoeic keratosis and others (angiomas, warts, etc.) (Table 1), regarding metadata 52.6% were women; the average age was 54 years; 32.8% had photodamage and 70.2% were phototype III. Most cases were located on the head and neck (50.6%); and 26.8% of the diagnosis were malignant.
Finally, we also suggest working with multidisciplinary teams composed of dermatologists and computer science professionals. Creating and improving databases will augment the performance of AI algorithms,9 and for us, this is a necessary step for performing collaborative work with other countries in the region (e.g., Latin America).3 Potential applications of the current database include algorithm training fine-tuned for local data as well as comparing different algorithms performance on different and diverse databases. The main limitations of our database is its relatively small size. Organising lesions requires a large team and multiple resources. Also, we have included only lesions with histopathology confirmation, biasing the database towards more 'suspicious' lesions. Using noninvasive imaging technologies such as reflectance confocal microscopy could be an alternative to include nonbiopsied benign lesions.10
Acquisition, analysis, and interpretation of data: Leonel Hidalgo, María Paz Salinas, Javiera Sepúlveda, Karina Carrasco, Pamela Romero, Alma Pedro, Soledad Vidaurre, Domingo Mery and Cristian Navarrete-Dechent. Drafting and revising the article: Leonel Hidalgo, María Paz Salinas, Javiera Sepúlveda, Karina Carrasco, Pamela Romero, Alma Pedro, Soledad Vidaurre, Domingo Mery and Cristian Navarrete-Dechent. Final approval of the version to be published: Leonel Hidalgo, María Paz Salinas, Javiera Sepúlveda, Karina Carrasco, Pamela Romero, Alma Pedro, Soledad Vidaurre, Domingo Mery and Cristian Navarrete-Dechent.
This work was funded in part by ANID—Millennium Science Initiative Programme ICN2021_004.
The authors declare no conflict of interest.
Reviewed and approved by Scientific Ethical Committee for Health Sciences of Pontificia Universidad Católica de Chile; approval #211213001.