Chat Generative Pretrained Transformer-4.0's accuracy in assessing cervical vertebrae and hand-wrist maturation stages: A retrospective study.

IF 3 2区医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE

American Journal of Orthodontics and Dentofacial Orthopedics Pub Date : 2025-09-29 DOI:10.1016/j.ajodo.2025.08.010

Meryem Akpınar, Farhad Salmanpour

{"title":"Chat Generative Pretrained Transformer-4.0's accuracy in assessing cervical vertebrae and hand-wrist maturation stages: A retrospective study.","authors":"Meryem Akpınar, Farhad Salmanpour","doi":"10.1016/j.ajodo.2025.08.010","DOIUrl":null,"url":null,"abstract":"Introduction: This study aimed to evaluate the diagnostic accuracy of Chat Generative Pretrained Transformer version 4.0 (ChatGPT-4.0) in determining cervical vertebrae and hand-wrist maturation stages using cephalometric and hand-wrist radiographic films.Methods: A retrospective analysis was conducted on 238 subjects who had cephalometric and hand-wrist radiographs taken on the same day. Each hand-wrist maturation stage was independently evaluated by 3 orthodontists using the method described by Björk and Helm, whereas cervical vertebrae maturation stages were assessed following the methodology proposed by Bacetti and coworkers. These evaluations served as the reference standard for measuring the performance of ChatGPT-4.0. The hand-wrist and cephalometric radiographs were analyzed by ChatGPT-4.0, and the results were recorded by the primary researcher.Results: The model achieved its highest performance in the hand-wrist maturation stages during the RU stage, with an area under the curve (AUC) value of 0.89. However, despite high precision values in the PP3U and MP3U stages, the model exhibited low recall values, indicating that certain positive instances were missed. In other stages, particularly DP3U and MP3CAP, low precision and recall values limited classification accuracy. Regarding cervical vertebral maturation stages (CVS), the model performed best in CVS1 (AUC, 0.82; precision, 0.806), with relatively favorable AUC values observed in CVS2 (AUC, 0.77). However, its predictive performance in CVS3 and CVS6 stages was suboptimal (AUC <0.67).Conclusions: ChatGPT-4.0 demonstrated accurate predictions in the RU and CVS1 stages. However, its overall performance was found to be inferior to that of other artificial intelligence models.","PeriodicalId":50806,"journal":{"name":"American Journal of Orthodontics and Dentofacial Orthopedics","volume":" ","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Orthodontics and Dentofacial Orthopedics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.ajodo.2025.08.010","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: This study aimed to evaluate the diagnostic accuracy of Chat Generative Pretrained Transformer version 4.0 (ChatGPT-4.0) in determining cervical vertebrae and hand-wrist maturation stages using cephalometric and hand-wrist radiographic films.

Methods: A retrospective analysis was conducted on 238 subjects who had cephalometric and hand-wrist radiographs taken on the same day. Each hand-wrist maturation stage was independently evaluated by 3 orthodontists using the method described by Björk and Helm, whereas cervical vertebrae maturation stages were assessed following the methodology proposed by Bacetti and coworkers. These evaluations served as the reference standard for measuring the performance of ChatGPT-4.0. The hand-wrist and cephalometric radiographs were analyzed by ChatGPT-4.0, and the results were recorded by the primary researcher.

Results: The model achieved its highest performance in the hand-wrist maturation stages during the RU stage, with an area under the curve (AUC) value of 0.89. However, despite high precision values in the PP3U and MP3U stages, the model exhibited low recall values, indicating that certain positive instances were missed. In other stages, particularly DP3U and MP3CAP, low precision and recall values limited classification accuracy. Regarding cervical vertebral maturation stages (CVS), the model performed best in CVS1 (AUC, 0.82; precision, 0.806), with relatively favorable AUC values observed in CVS2 (AUC, 0.77). However, its predictive performance in CVS3 and CVS6 stages was suboptimal (AUC <0.67).

Conclusions: ChatGPT-4.0 demonstrated accurate predictions in the RU and CVS1 stages. However, its overall performance was found to be inferior to that of other artificial intelligence models.

查看原文本刊更多论文

聊天生成预训练Transformer-4.0在评估颈椎和手腕成熟阶段的准确性：一项回顾性研究。

简介：本研究旨在评估ChatGPT-4.0版本（ChatGPT-4.0）在使用头位片和腕部x线片确定颈椎和腕成熟阶段的诊断准确性。方法：对238例同日摄头腕x线片进行回顾性分析。每个手-腕成熟阶段由3名正畸医生使用Björk和Helm描述的方法独立评估，而颈椎成熟阶段采用Bacetti及其同事提出的方法评估。这些评估作为衡量ChatGPT-4.0性能的参考标准。使用ChatGPT-4.0分析了手腕部和头侧x线片，并由主要研究者记录结果。结果：该模型在RU阶段的腕手成熟阶段表现最佳，曲线下面积（AUC）值为0.89。然而，尽管在PP3U和MP3U阶段具有较高的精度值，但该模型显示出较低的召回值，表明遗漏了某些积极的实例。在其他阶段，特别是DP3U和MP3CAP，较低的准确率和召回率限制了分类精度。在颈椎成熟阶段（CVS）方面，该模型在CVS1中表现最好（AUC为0.82，精度为0.806），在CVS2中AUC值相对较好（AUC为0.77）。然而，其在CVS3和CVS6期的预测性能不是最优的(AUC结论：ChatGPT-4.0在RU和CVS1期的预测是准确的。然而，它的整体性能被发现不如其他人工智能模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

American Journal of Orthodontics and Dentofacial Orthopedics 医学-牙科与口腔外科

CiteScore

4.80

自引率

13.30%

发文量

432

审稿时长

66 days

期刊介绍： Published for more than 100 years, the American Journal of Orthodontics and Dentofacial Orthopedics remains the leading orthodontic resource. It is the official publication of the American Association of Orthodontists, its constituent societies, the American Board of Orthodontics, and the College of Diplomates of the American Board of Orthodontics. Each month its readers have access to original peer-reviewed articles that examine all phases of orthodontic treatment. Illustrated throughout, the publication includes tables, color photographs, and statistical data. Coverage includes successful diagnostic procedures, imaging techniques, bracket and archwire materials, extraction and impaction concerns, orthognathic surgery, TMJ disorders, removable appliances, and adult therapy.