Çağrı Becerik, Selçuk Yıldız, Çiğdem Tepe Karaca, Sema Zer Toros
{"title":"Evaluation of the Usability of ChatGPT-4 and Google Gemini in Patient Education About Rhinosinusitis.","authors":"Çağrı Becerik, Selçuk Yıldız, Çiğdem Tepe Karaca, Sema Zer Toros","doi":"10.1111/coa.14273","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Artificial intelligence (AI) based chat robots are increasingly used by users for patient education about common diseases in the health field, as in every field. This study aims to evaluate and compare patient education materials on rhinosinusitis created by two frequently used chat robots, ChatGPT-4 and Google Gemini.</p><p><strong>Method: </strong>One hundred nine questions taken from patient information websites were divided into 4 different categories: general knowledge, diagnosis, treatment, surgery and complications, then asked to chat robots. The answers given were evaluated by two different expert otolaryngologists, and on questions where the scores were different, a third, more experienced otolaryngologist finalised the evaluation. Questions were scored from 1 to 4: (1) comprehensive/correct, (2) incomplete/partially correct, (3) accurate and inaccurate data, potentially misleading and (4) completely inaccurate/irrelevant.</p><p><strong>Results: </strong>In evaluating the answers given by ChatGPT-4, all answers in the Diagnosis category were evaluated as comprehensive/correct. In the evaluation of the answers given by Google Gemini, the answers evaluated as completely inaccurate/irrelevant in the treatment category were found to be statistically significantly higher, and the answers evaluated as incomplete/partially correct in the surgery and complications category were found to be statistically significantly higher. In the comparison between the two chat robots, in the treatment category, ChatGPT-4 had a higher correct evaluation rate than Google Gemini and was found to be statistically significant.</p><p><strong>Conclusion: </strong>The answers given by ChatGPT-4 and Google Gemini chat robots regarding rhinosinusitis were evaluated as sufficient and informative.</p>","PeriodicalId":10431,"journal":{"name":"Clinical Otolaryngology","volume":" ","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Otolaryngology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/coa.14273","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OTORHINOLARYNGOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Artificial intelligence (AI) based chat robots are increasingly used by users for patient education about common diseases in the health field, as in every field. This study aims to evaluate and compare patient education materials on rhinosinusitis created by two frequently used chat robots, ChatGPT-4 and Google Gemini.
Method: One hundred nine questions taken from patient information websites were divided into 4 different categories: general knowledge, diagnosis, treatment, surgery and complications, then asked to chat robots. The answers given were evaluated by two different expert otolaryngologists, and on questions where the scores were different, a third, more experienced otolaryngologist finalised the evaluation. Questions were scored from 1 to 4: (1) comprehensive/correct, (2) incomplete/partially correct, (3) accurate and inaccurate data, potentially misleading and (4) completely inaccurate/irrelevant.
Results: In evaluating the answers given by ChatGPT-4, all answers in the Diagnosis category were evaluated as comprehensive/correct. In the evaluation of the answers given by Google Gemini, the answers evaluated as completely inaccurate/irrelevant in the treatment category were found to be statistically significantly higher, and the answers evaluated as incomplete/partially correct in the surgery and complications category were found to be statistically significantly higher. In the comparison between the two chat robots, in the treatment category, ChatGPT-4 had a higher correct evaluation rate than Google Gemini and was found to be statistically significant.
Conclusion: The answers given by ChatGPT-4 and Google Gemini chat robots regarding rhinosinusitis were evaluated as sufficient and informative.
期刊介绍:
Clinical Otolaryngology is a bimonthly journal devoted to clinically-oriented research papers of the highest scientific standards dealing with:
current otorhinolaryngological practice
audiology, otology, balance, rhinology, larynx, voice and paediatric ORL
head and neck oncology
head and neck plastic and reconstructive surgery
continuing medical education and ORL training
The emphasis is on high quality new work in the clinical field and on fresh, original research.
Each issue begins with an editorial expressing the personal opinions of an individual with a particular knowledge of a chosen subject. The main body of each issue is then devoted to original papers carrying important results for those working in the field. In addition, topical review articles are published discussing a particular subject in depth, including not only the opinions of the author but also any controversies surrounding the subject.
• Negative/null results
In order for research to advance, negative results, which often make a valuable contribution to the field, should be published. However, articles containing negative or null results are frequently not considered for publication or rejected by journals. We welcome papers of this kind, where appropriate and valid power calculations are included that give confidence that a negative result can be relied upon.