以字节为单位的诊断:比较Google和ChatGPT 3.5作为诊断支持工具的诊断准确性

medRxiv (Cold Spring Harbor Laboratory) Pub Date : 2023-11-12 DOI:10.1101/2023.11.10.23294668

Guilherne Guimaraes, Caroline Santos Silva, Jean Carlos Z Contreras, Ricardo G Figueiredo, Ricardo B Tiraboschi, Cristiano M Gomes, Jose Bessa

{"title":"以字节为单位的诊断:比较Google和ChatGPT 3.5作为诊断支持工具的诊断准确性","authors":"Guilherne Guimaraes, Caroline Santos Silva, Jean Carlos Z Contreras, Ricardo G Figueiredo, Ricardo B Tiraboschi, Cristiano M Gomes, Jose Bessa","doi":"10.1101/2023.11.10.23294668","DOIUrl":null,"url":null,"abstract":"Objective: Adopting digital technologies as diagnostic support tools in medicine is unquestionable. However, the accuracy in suggesting diagnoses remains controversial and underexplored. We aimed to evaluate and compare the diagnostic accuracy of two primary and accessible internet search tools: Google and ChatGPT 3.5. Method: We used 60 clinical cases related to urological pathologies to evaluate both platforms. These cases were divided into two groups: one with common conditions (constructed from the most frequent symptoms, following EAU and UpToDate guidelines) and another with rare disorders - based on case reports published between 2022 and 2023 in Urology Case Reports. Each case was inputted into Google Search and ChatGPT 3.5, and the results were categorized as \"correct diagnosis,\" \"likely differential diagnosis,\" or \"incorrect diagnosis.\" A team of researchers evaluated the responses blindly and randomly. Results: In typical cases, Google achieved 53.3% accuracy, offering a likely differential diagnosis in 23.3% and errors in the rest. ChatGPT 3.5 exhibited superior performance, with 86.6% accuracy, and suggested a reasonable differential diagnosis in 13.3%, without mistakes. In rare cases, Google did not provide correct diagnoses but offered a likely differential diagnosis in 20%. ChatGPT 3.5 achieved 16.6% accuracy, with 50% differential diagnoses. Conclusion: ChatGPT 3.5 demonstrated higher diagnostic accuracy than Google in both contexts. The platform showed acceptable accuracy in common cases; however, limitations in rare cases remained evident.","PeriodicalId":478577,"journal":{"name":"medRxiv (Cold Spring Harbor Laboratory)","volume":"4 10","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Diagnosis in Bytes: Comparing the Diagnostic Accuracy of Google and ChatGPT 3.5 as Diagnostic Support Tools\",\"authors\":\"Guilherne Guimaraes, Caroline Santos Silva, Jean Carlos Z Contreras, Ricardo G Figueiredo, Ricardo B Tiraboschi, Cristiano M Gomes, Jose Bessa\",\"doi\":\"10.1101/2023.11.10.23294668\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective: Adopting digital technologies as diagnostic support tools in medicine is unquestionable. However, the accuracy in suggesting diagnoses remains controversial and underexplored. We aimed to evaluate and compare the diagnostic accuracy of two primary and accessible internet search tools: Google and ChatGPT 3.5. Method: We used 60 clinical cases related to urological pathologies to evaluate both platforms. These cases were divided into two groups: one with common conditions (constructed from the most frequent symptoms, following EAU and UpToDate guidelines) and another with rare disorders - based on case reports published between 2022 and 2023 in Urology Case Reports. Each case was inputted into Google Search and ChatGPT 3.5, and the results were categorized as \\\"correct diagnosis,\\\" \\\"likely differential diagnosis,\\\" or \\\"incorrect diagnosis.\\\" A team of researchers evaluated the responses blindly and randomly. Results: In typical cases, Google achieved 53.3% accuracy, offering a likely differential diagnosis in 23.3% and errors in the rest. ChatGPT 3.5 exhibited superior performance, with 86.6% accuracy, and suggested a reasonable differential diagnosis in 13.3%, without mistakes. In rare cases, Google did not provide correct diagnoses but offered a likely differential diagnosis in 20%. ChatGPT 3.5 achieved 16.6% accuracy, with 50% differential diagnoses. Conclusion: ChatGPT 3.5 demonstrated higher diagnostic accuracy than Google in both contexts. The platform showed acceptable accuracy in common cases; however, limitations in rare cases remained evident.\",\"PeriodicalId\":478577,\"journal\":{\"name\":\"medRxiv (Cold Spring Harbor Laboratory)\",\"volume\":\"4 10\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"medRxiv (Cold Spring Harbor Laboratory)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2023.11.10.23294668\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv (Cold Spring Harbor Laboratory)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2023.11.10.23294668","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

目的:采用数字技术作为医学诊断辅助工具是毋庸置疑的。然而，建议诊断的准确性仍然存在争议和未充分探索。我们的目的是评估和比较两个主要的和可访问的互联网搜索工具的诊断准确性:Google和ChatGPT 3.5。方法:我们用60例泌尿外科相关的临床病例对这两种平台进行评估。这些病例被分为两组:一组是常见病(根据最常见的症状构建，遵循EAU和UpToDate指南)，另一组是罕见病(基于在泌尿科病例报告中发表的2022年至2023年的病例报告)。每个病例都被输入到谷歌搜索和ChatGPT 3.5中，结果被分类为“正确诊断”、“可能的鉴别诊断”或“错误诊断”。一组研究人员盲目而随机地评估了这些反应。结果:在典型病例中，谷歌的准确率达到53.3%，提供可能的鉴别诊断的准确率为23.3%，其余为错误。ChatGPT 3.5表现出优异的性能，准确率为86.6%，建议合理鉴别诊断的准确率为13.3%，无错误。在极少数情况下，谷歌没有提供正确的诊断，但在20%的情况下提供了可能的鉴别诊断。ChatGPT 3.5的准确率为16.6%，鉴别诊断率为50%。结论:ChatGPT 3.5在两种情况下的诊断准确率均高于Google。该平台在一般情况下显示出可接受的准确性;然而，在少数情况下，局限性仍然很明显。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Diagnosis in Bytes: Comparing the Diagnostic Accuracy of Google and ChatGPT 3.5 as Diagnostic Support Tools

Objective: Adopting digital technologies as diagnostic support tools in medicine is unquestionable. However, the accuracy in suggesting diagnoses remains controversial and underexplored. We aimed to evaluate and compare the diagnostic accuracy of two primary and accessible internet search tools: Google and ChatGPT 3.5. Method: We used 60 clinical cases related to urological pathologies to evaluate both platforms. These cases were divided into two groups: one with common conditions (constructed from the most frequent symptoms, following EAU and UpToDate guidelines) and another with rare disorders - based on case reports published between 2022 and 2023 in Urology Case Reports. Each case was inputted into Google Search and ChatGPT 3.5, and the results were categorized as "correct diagnosis," "likely differential diagnosis," or "incorrect diagnosis." A team of researchers evaluated the responses blindly and randomly. Results: In typical cases, Google achieved 53.3% accuracy, offering a likely differential diagnosis in 23.3% and errors in the rest. ChatGPT 3.5 exhibited superior performance, with 86.6% accuracy, and suggested a reasonable differential diagnosis in 13.3%, without mistakes. In rare cases, Google did not provide correct diagnoses but offered a likely differential diagnosis in 20%. ChatGPT 3.5 achieved 16.6% accuracy, with 50% differential diagnoses. Conclusion: ChatGPT 3.5 demonstrated higher diagnostic accuracy than Google in both contexts. The platform showed acceptable accuracy in common cases; however, limitations in rare cases remained evident.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

medRxiv (Cold Spring Harbor Laboratory)

自引率

0.00%

发文量