Diagnostic performance of ChatGPT-4.0 in histopathological description analysis of oral and maxillofacial lesions: a comparative study with pathologists.
Maria Cuevas-Nunez, Valentina Ignacia Alvarez Silberberg, Maria Arregui, Bruno C Jham, Rosa Ballester-Victoria, Inessa Koptseva, María José Biosca Gómez de Tejada, Rodolfo Posada-Caez, Victor Gil Manich, Javier Bara-Casaus, Maria-Teresa Fernández-Figueras
{"title":"Diagnostic performance of ChatGPT-4.0 in histopathological description analysis of oral and maxillofacial lesions: a comparative study with pathologists.","authors":"Maria Cuevas-Nunez, Valentina Ignacia Alvarez Silberberg, Maria Arregui, Bruno C Jham, Rosa Ballester-Victoria, Inessa Koptseva, María José Biosca Gómez de Tejada, Rodolfo Posada-Caez, Victor Gil Manich, Javier Bara-Casaus, Maria-Teresa Fernández-Figueras","doi":"10.1016/j.oooo.2024.11.087","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To evaluate the diagnostic performance of ChatGPT-4.0 in histopathological diagnoses of oral and maxillofacial lesions and compare its performance with pathologists.</p><p><strong>Study design: </strong>A retrospective analysis of 102 histopathological descriptions was conducted. Data, including site, age and sex, were anonymized from the General University Hospital's Department of Pathology. ChatGPT-4.0 provided diagnoses, which were categorized as correct, similar, or different compared to pathologists' diagnoses. Descriptive statistics, Chi-squared tests, correlation, and regression analyses were used to assess accuracy and the influence of age and gender.</p><p><strong>Results: </strong>ChatGPT-4.0 correctly diagnosed 61 out of 102 cases, yielding an accuracy of 59.8%. The distribution of diagnostic scores did not significantly deviate from expectations (Chi-squared Statistic: 0.0, P = 1.0). A moderate negative correlation between age and diagnostic scores (r = -0.33) was observed, with age significantly predicting scores (P = .001). No significant difference was found between genders (P = .26). ChatGPT-4.0 performed worst with granuloma and inflammation cases (100% incorrect) and best with mucocele cases (93.3% correct).</p><p><strong>Conclusion: </strong>ChatGPT-4.0 shows moderate accuracy in histopathological diagnosis of oral and maxillofacial lesions, with performance varying by lesion type. Improvements are needed to enhance its clinical reliability.</p>","PeriodicalId":49010,"journal":{"name":"Oral Surgery Oral Medicine Oral Pathology Oral Radiology","volume":" ","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Oral Surgery Oral Medicine Oral Pathology Oral Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.oooo.2024.11.087","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: To evaluate the diagnostic performance of ChatGPT-4.0 in histopathological diagnoses of oral and maxillofacial lesions and compare its performance with pathologists.
Study design: A retrospective analysis of 102 histopathological descriptions was conducted. Data, including site, age and sex, were anonymized from the General University Hospital's Department of Pathology. ChatGPT-4.0 provided diagnoses, which were categorized as correct, similar, or different compared to pathologists' diagnoses. Descriptive statistics, Chi-squared tests, correlation, and regression analyses were used to assess accuracy and the influence of age and gender.
Results: ChatGPT-4.0 correctly diagnosed 61 out of 102 cases, yielding an accuracy of 59.8%. The distribution of diagnostic scores did not significantly deviate from expectations (Chi-squared Statistic: 0.0, P = 1.0). A moderate negative correlation between age and diagnostic scores (r = -0.33) was observed, with age significantly predicting scores (P = .001). No significant difference was found between genders (P = .26). ChatGPT-4.0 performed worst with granuloma and inflammation cases (100% incorrect) and best with mucocele cases (93.3% correct).
Conclusion: ChatGPT-4.0 shows moderate accuracy in histopathological diagnosis of oral and maxillofacial lesions, with performance varying by lesion type. Improvements are needed to enhance its clinical reliability.
期刊介绍:
Oral Surgery, Oral Medicine, Oral Pathology and Oral Radiology is required reading for anyone in the fields of oral surgery, oral medicine, oral pathology, oral radiology or advanced general practice dentistry. It is the only major dental journal that provides a practical and complete overview of the medical and surgical techniques of dental practice in four areas. Topics covered include such current issues as dental implants, treatment of HIV-infected patients, and evaluation and treatment of TMJ disorders. The official publication for nine societies, the Journal is recommended for initial purchase in the Brandon Hill study, Selected List of Books and Journals for the Small Medical Library.