评估 GPT-4V 在日本全国牙科考试中的表现：挑战探索

IF 3.1 3区医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE

Journal of Dental Sciences Pub Date : 2024-07-01 DOI:10.1016/j.jds.2023.12.007

Masaki Morishita , Hikaru Fukuda , Kosuke Muraoka , Taiji Nakamura , Masanari Hayashi , Izumi Yoshioka , Kentaro Ono , Shuji Awano

{"title":"评估 GPT-4V 在日本全国牙科考试中的表现：挑战探索","authors":"Masaki Morishita , Hikaru Fukuda , Kosuke Muraoka , Taiji Nakamura , Masanari Hayashi , Izumi Yoshioka , Kentaro Ono , Shuji Awano","doi":"10.1016/j.jds.2023.12.007","DOIUrl":null,"url":null,"abstract":"<div><h3>Background/purpose</h3><p>Rapid advancements in AI technology have led to significant interest in its application across various fields, including medicine and dentistry. This study aimed to assess the capabilities of ChatGPT-4V with image recognition in answering image-based questions from the Japanese National Dental Examination (JNDE) to explore its potential as an educational support tool for dental students.</p></div><div><h3>Materials and methods</h3><p>The dataset used questions from the JNDE, which was conducted in January 2023, with a focus on image-related queries. ChatGPT-4V was utilized, and standardized prompts, question texts, and images were input. Data and statistical analyses were conducted using Qlik Sense® and GraphPad Prism.</p></div><div><h3>Results</h3><p>The overall correct response rate of ChatGPT-4V for image-based JNDE questions was 35.0 %. The correct response rates were 57.1 % for compulsory questions, 43.6 % for general questions, and 28.6 % for clinical practical questions. In specialties like Dental Anesthesiology and Endodontics, ChatGPT-4V achieved correct response rates above 70 %, while response rates for Orthodontics and Oral Surgery were lower. A higher number of images in questions was correlated with lower accuracy, suggesting an impact of the number of images on correct and incorrect responses.</p></div><div><h3>Conclusion</h3><p>While innovative, ChatGPT-4V’s image recognition feature exhibited limitations, especially in handling image-intensive and complex clinical practical questions, and is not yet fully suitable as an educational support tool for dental students at its current stage. Further technological refinement and re-evaluation with a broader dataset are recommended.</p></div>","PeriodicalId":15583,"journal":{"name":"Journal of Dental Sciences","volume":"19 3","pages":"Pages 1595-1600"},"PeriodicalIF":3.1000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1991790223003999/pdfft?md5=db38e189f0707713f0742868bb8f73ce&pid=1-s2.0-S1991790223003999-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Evaluating GPT-4V’s performance in the Japanese national dental examination: A challenge explored\",\"authors\":\"Masaki Morishita , Hikaru Fukuda , Kosuke Muraoka , Taiji Nakamura , Masanari Hayashi , Izumi Yoshioka , Kentaro Ono , Shuji Awano\",\"doi\":\"10.1016/j.jds.2023.12.007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background/purpose</h3><p>Rapid advancements in AI technology have led to significant interest in its application across various fields, including medicine and dentistry. This study aimed to assess the capabilities of ChatGPT-4V with image recognition in answering image-based questions from the Japanese National Dental Examination (JNDE) to explore its potential as an educational support tool for dental students.</p></div><div><h3>Materials and methods</h3><p>The dataset used questions from the JNDE, which was conducted in January 2023, with a focus on image-related queries. ChatGPT-4V was utilized, and standardized prompts, question texts, and images were input. Data and statistical analyses were conducted using Qlik Sense® and GraphPad Prism.</p></div><div><h3>Results</h3><p>The overall correct response rate of ChatGPT-4V for image-based JNDE questions was 35.0 %. The correct response rates were 57.1 % for compulsory questions, 43.6 % for general questions, and 28.6 % for clinical practical questions. In specialties like Dental Anesthesiology and Endodontics, ChatGPT-4V achieved correct response rates above 70 %, while response rates for Orthodontics and Oral Surgery were lower. A higher number of images in questions was correlated with lower accuracy, suggesting an impact of the number of images on correct and incorrect responses.</p></div><div><h3>Conclusion</h3><p>While innovative, ChatGPT-4V’s image recognition feature exhibited limitations, especially in handling image-intensive and complex clinical practical questions, and is not yet fully suitable as an educational support tool for dental students at its current stage. Further technological refinement and re-evaluation with a broader dataset are recommended.</p></div>\",\"PeriodicalId\":15583,\"journal\":{\"name\":\"Journal of Dental Sciences\",\"volume\":\"19 3\",\"pages\":\"Pages 1595-1600\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1991790223003999/pdfft?md5=db38e189f0707713f0742868bb8f73ce&pid=1-s2.0-S1991790223003999-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Dental Sciences\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1991790223003999\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Dental Sciences","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1991790223003999","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

摘要

背景/目的人工智能技术的快速发展引起了人们对其在包括医学和牙科在内的各个领域应用的浓厚兴趣。本研究旨在评估带有图像识别功能的 ChatGPT-4V 在回答日本全国牙科考试（JNDE）中基于图像的问题时的能力，以探索其作为牙科学生教育支持工具的潜力。使用了 ChatGPT-4V，并输入了标准化的提示、问题文本和图像。结果对于基于图像的 JNDE 问题，ChatGPT-4V 的总体正确回复率为 35.0%。必答题的正确率为 57.1%，综合题为 43.6%，临床实践题为 28.6%。在牙科麻醉学和牙髓病学等专业中，ChatGPT-4V 的正确回答率超过 70%，而正畸学和口腔外科的回答率较低。结论 ChatGPT-4V 的图像识别功能虽然具有创新性，但也存在局限性，尤其是在处理图像密集型和复杂的临床实践问题时，在现阶段还不完全适合作为牙科学生的教学辅助工具。建议进一步改进技术，并使用更广泛的数据集进行重新评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluating GPT-4V’s performance in the Japanese national dental examination: A challenge explored

Background/purpose

Rapid advancements in AI technology have led to significant interest in its application across various fields, including medicine and dentistry. This study aimed to assess the capabilities of ChatGPT-4V with image recognition in answering image-based questions from the Japanese National Dental Examination (JNDE) to explore its potential as an educational support tool for dental students.

Materials and methods

The dataset used questions from the JNDE, which was conducted in January 2023, with a focus on image-related queries. ChatGPT-4V was utilized, and standardized prompts, question texts, and images were input. Data and statistical analyses were conducted using Qlik Sense® and GraphPad Prism.

Results

The overall correct response rate of ChatGPT-4V for image-based JNDE questions was 35.0 %. The correct response rates were 57.1 % for compulsory questions, 43.6 % for general questions, and 28.6 % for clinical practical questions. In specialties like Dental Anesthesiology and Endodontics, ChatGPT-4V achieved correct response rates above 70 %, while response rates for Orthodontics and Oral Surgery were lower. A higher number of images in questions was correlated with lower accuracy, suggesting an impact of the number of images on correct and incorrect responses.

Conclusion

While innovative, ChatGPT-4V’s image recognition feature exhibited limitations, especially in handling image-intensive and complex clinical practical questions, and is not yet fully suitable as an educational support tool for dental students at its current stage. Further technological refinement and re-evaluation with a broader dataset are recommended.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Dental Sciences 医学-牙科与口腔外科

CiteScore

5.10

自引率

14.30%

发文量

348

审稿时长

6 days

期刊介绍： he Journal of Dental Sciences (JDS), published quarterly, is the official and open access publication of the Association for Dental Sciences of the Republic of China (ADS-ROC). The precedent journal of the JDS is the Chinese Dental Journal (CDJ) which had already been covered by MEDLINE in 1988. As the CDJ continued to prove its importance in the region, the ADS-ROC decided to move to the international community by publishing an English journal. Hence, the birth of the JDS in 2006. The JDS is indexed in the SCI Expanded since 2008. It is also indexed in Scopus, and EMCare, ScienceDirect, SIIC Data Bases. The topics covered by the JDS include all fields of basic and clinical dentistry. Some manuscripts focusing on the study of certain endemic diseases such as dental caries and periodontal diseases in particular regions of any country as well as oral pre-cancers, oral cancers, and oral submucous fibrosis related to betel nut chewing habit are also considered for publication. Besides, the JDS also publishes articles about the efficacy of a new treatment modality on oral verrucous hyperplasia or early oral squamous cell carcinoma.