{"title":"Does AI need anything more than a single image to diagnose melanoma?","authors":"Aimilios Lallas, John Paoli","doi":"10.1111/jdv.20793","DOIUrl":null,"url":null,"abstract":"<p>Recent research consistently demonstrates the high accuracy of artificial intelligence (ΑΙ)-driven image analysis in diagnosing melanoma. The key benchmark for comparison has usually been the performance of human readers, who were outperformed by AI even in the initial experimental studies.<span><sup>1</sup></span></p><p>A significant drawback of these studies is their failure to replicate the clinical setting, due to the omission of parameters that are highly relevant in real-world scenarios.<span><sup>2</sup></span> Most studies used single clinical or dermoscopic images of lesions both for training algorithms and for evaluating the performance of algorithms and human raters. While this approach seems logical for an AI algorithm, it contrasts sharply with the practice of clinicians. Clinicians do not evaluate single images but examine unique individuals, considering a multitude of factors that contribute to a comprehensive evaluation. The clinical assessment encompasses factors such as phenotype, phototype, pigmentary trait, total lesion count, detailed analysis of lesion types and their distinct features and texture and review of their evolution history. The failure of previous studies to include these important parameters was one of the main limitations to the applicability of their findings in clinical practice.</p><p>The study by Kurtansky et al. represents one of the first efforts to integrate contextual information into the training and evaluation of AI algorithms for melanoma diagnosis.<span><sup>3</sup></span> It reports on the outcomes of the 2020 SIIM-ISIC Melanoma Classification Challenge, which saw participation from 3308 teams across 97 countries, submitting a total of 101,845 entries to the AI competition. Most importantly, this was the first initiative to employ a data set of patient-contextual lesion images to evaluate the influence of intrapatient lesion patterns on classifying melanoma. In the reader study, each index image was first assessed alone and then alongside seven additional dermoscopic images of nevi from the same patient.</p><p>The study reports two main findings. First, the top performing AI algorithm for melanoma diagnosis achieved an area under the receiver operating curve of 0.95. This result is consistent with trends of steadily improving algorithm performance in recent years, driven by the availability of larger training sets and ongoing advancements in deep learning techniques.</p><p>Second, the study found that including patient-contextual lesion images had no significant effect on the diagnostic accuracy, neither for the algorithms nor for the human readers. This result is somewhat unexpected and challenges the assumption that intra-patient lesion comparisons enhance diagnostic performance. Prior evidence suggests that melanoma detection can be enhanced by contextual information, as demonstrated by the comparative approach, an intrapatient assessment strategy.<span><sup>4</sup></span> Moreover, it is important to note that, in real-world clinical settings, clinicians operate with a specificity approaching 100%, due to their routine exposure to a much larger number of benign lesions. The lack of improvement in the diagnostic performance among human readers in the study may indicate that presenting seven images of single lesions offers an insufficient approximation of the broader visual context typically available in clinical practice. This stark contrast in diagnostic environment further underscores the challenges of replicating clinical realism in algorithm evaluation frameworks.</p><p>Although the AI algorithms in the study largely ignored contextual information, their diagnostic accuracy remained very high. However, for clinical deployment, algorithms must achieve specificity approaching 100% to avoid a surge in unnecessary excisions of benign lesions.<span><sup>5</sup></span> Consequently, more effective integration of contextual information is essential to improve specificity, and future research efforts should prioritize this objective. Furthermore, AI-based image analysis is increasingly applied not only for diagnostic classification but also for prognostic assessments and prediction of treatment response.<span><sup>6</sup></span> Incorporating diverse contextual variables, such as patient history, lesion evolution and broader skin patterns, would likely enhance the predictive power of these models across multiple tasks.</p><p>None.</p>","PeriodicalId":17351,"journal":{"name":"Journal of the European Academy of Dermatology and Venereology","volume":"39 8","pages":"1378-1379"},"PeriodicalIF":8.4000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jdv.20793","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the European Academy of Dermatology and Venereology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jdv.20793","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DERMATOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Recent research consistently demonstrates the high accuracy of artificial intelligence (ΑΙ)-driven image analysis in diagnosing melanoma. The key benchmark for comparison has usually been the performance of human readers, who were outperformed by AI even in the initial experimental studies.1
A significant drawback of these studies is their failure to replicate the clinical setting, due to the omission of parameters that are highly relevant in real-world scenarios.2 Most studies used single clinical or dermoscopic images of lesions both for training algorithms and for evaluating the performance of algorithms and human raters. While this approach seems logical for an AI algorithm, it contrasts sharply with the practice of clinicians. Clinicians do not evaluate single images but examine unique individuals, considering a multitude of factors that contribute to a comprehensive evaluation. The clinical assessment encompasses factors such as phenotype, phototype, pigmentary trait, total lesion count, detailed analysis of lesion types and their distinct features and texture and review of their evolution history. The failure of previous studies to include these important parameters was one of the main limitations to the applicability of their findings in clinical practice.
The study by Kurtansky et al. represents one of the first efforts to integrate contextual information into the training and evaluation of AI algorithms for melanoma diagnosis.3 It reports on the outcomes of the 2020 SIIM-ISIC Melanoma Classification Challenge, which saw participation from 3308 teams across 97 countries, submitting a total of 101,845 entries to the AI competition. Most importantly, this was the first initiative to employ a data set of patient-contextual lesion images to evaluate the influence of intrapatient lesion patterns on classifying melanoma. In the reader study, each index image was first assessed alone and then alongside seven additional dermoscopic images of nevi from the same patient.
The study reports two main findings. First, the top performing AI algorithm for melanoma diagnosis achieved an area under the receiver operating curve of 0.95. This result is consistent with trends of steadily improving algorithm performance in recent years, driven by the availability of larger training sets and ongoing advancements in deep learning techniques.
Second, the study found that including patient-contextual lesion images had no significant effect on the diagnostic accuracy, neither for the algorithms nor for the human readers. This result is somewhat unexpected and challenges the assumption that intra-patient lesion comparisons enhance diagnostic performance. Prior evidence suggests that melanoma detection can be enhanced by contextual information, as demonstrated by the comparative approach, an intrapatient assessment strategy.4 Moreover, it is important to note that, in real-world clinical settings, clinicians operate with a specificity approaching 100%, due to their routine exposure to a much larger number of benign lesions. The lack of improvement in the diagnostic performance among human readers in the study may indicate that presenting seven images of single lesions offers an insufficient approximation of the broader visual context typically available in clinical practice. This stark contrast in diagnostic environment further underscores the challenges of replicating clinical realism in algorithm evaluation frameworks.
Although the AI algorithms in the study largely ignored contextual information, their diagnostic accuracy remained very high. However, for clinical deployment, algorithms must achieve specificity approaching 100% to avoid a surge in unnecessary excisions of benign lesions.5 Consequently, more effective integration of contextual information is essential to improve specificity, and future research efforts should prioritize this objective. Furthermore, AI-based image analysis is increasingly applied not only for diagnostic classification but also for prognostic assessments and prediction of treatment response.6 Incorporating diverse contextual variables, such as patient history, lesion evolution and broader skin patterns, would likely enhance the predictive power of these models across multiple tasks.
期刊介绍:
The Journal of the European Academy of Dermatology and Venereology (JEADV) is a publication that focuses on dermatology and venereology. It covers various topics within these fields, including both clinical and basic science subjects. The journal publishes articles in different formats, such as editorials, review articles, practice articles, original papers, short reports, letters to the editor, features, and announcements from the European Academy of Dermatology and Venereology (EADV).
The journal covers a wide range of keywords, including allergy, cancer, clinical medicine, cytokines, dermatology, drug reactions, hair disease, laser therapy, nail disease, oncology, skin cancer, skin disease, therapeutics, tumors, virus infections, and venereology.
The JEADV is indexed and abstracted by various databases and resources, including Abstracts on Hygiene & Communicable Diseases, Academic Search, AgBiotech News & Information, Botanical Pesticides, CAB Abstracts®, Embase, Global Health, InfoTrac, Ingenta Select, MEDLINE/PubMed, Science Citation Index Expanded, and others.