Guilherne Guimaraes, Caroline Santos Silva, Jean Carlos Z Contreras, Ricardo G Figueiredo, Ricardo B Tiraboschi, Cristiano M Gomes, Jose Bessa
{"title":"以字节为单位的诊断:比较Google和ChatGPT 3.5作为诊断支持工具的诊断准确性","authors":"Guilherne Guimaraes, Caroline Santos Silva, Jean Carlos Z Contreras, Ricardo G Figueiredo, Ricardo B Tiraboschi, Cristiano M Gomes, Jose Bessa","doi":"10.1101/2023.11.10.23294668","DOIUrl":null,"url":null,"abstract":"Objective: Adopting digital technologies as diagnostic support tools in medicine is unquestionable. However, the accuracy in suggesting diagnoses remains controversial and underexplored. We aimed to evaluate and compare the diagnostic accuracy of two primary and accessible internet search tools: Google and ChatGPT 3.5. Method: We used 60 clinical cases related to urological pathologies to evaluate both platforms. These cases were divided into two groups: one with common conditions (constructed from the most frequent symptoms, following EAU and UpToDate guidelines) and another with rare disorders - based on case reports published between 2022 and 2023 in Urology Case Reports. Each case was inputted into Google Search and ChatGPT 3.5, and the results were categorized as \"correct diagnosis,\" \"likely differential diagnosis,\" or \"incorrect diagnosis.\" A team of researchers evaluated the responses blindly and randomly. Results: In typical cases, Google achieved 53.3% accuracy, offering a likely differential diagnosis in 23.3% and errors in the rest. ChatGPT 3.5 exhibited superior performance, with 86.6% accuracy, and suggested a reasonable differential diagnosis in 13.3%, without mistakes. In rare cases, Google did not provide correct diagnoses but offered a likely differential diagnosis in 20%. ChatGPT 3.5 achieved 16.6% accuracy, with 50% differential diagnoses. Conclusion: ChatGPT 3.5 demonstrated higher diagnostic accuracy than Google in both contexts. The platform showed acceptable accuracy in common cases; however, limitations in rare cases remained evident.","PeriodicalId":478577,"journal":{"name":"medRxiv (Cold Spring Harbor Laboratory)","volume":"4 10","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Diagnosis in Bytes: Comparing the Diagnostic Accuracy of Google and ChatGPT 3.5 as Diagnostic Support Tools\",\"authors\":\"Guilherne Guimaraes, Caroline Santos Silva, Jean Carlos Z Contreras, Ricardo G Figueiredo, Ricardo B Tiraboschi, Cristiano M Gomes, Jose Bessa\",\"doi\":\"10.1101/2023.11.10.23294668\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective: Adopting digital technologies as diagnostic support tools in medicine is unquestionable. However, the accuracy in suggesting diagnoses remains controversial and underexplored. We aimed to evaluate and compare the diagnostic accuracy of two primary and accessible internet search tools: Google and ChatGPT 3.5. Method: We used 60 clinical cases related to urological pathologies to evaluate both platforms. These cases were divided into two groups: one with common conditions (constructed from the most frequent symptoms, following EAU and UpToDate guidelines) and another with rare disorders - based on case reports published between 2022 and 2023 in Urology Case Reports. Each case was inputted into Google Search and ChatGPT 3.5, and the results were categorized as \\\"correct diagnosis,\\\" \\\"likely differential diagnosis,\\\" or \\\"incorrect diagnosis.\\\" A team of researchers evaluated the responses blindly and randomly. Results: In typical cases, Google achieved 53.3% accuracy, offering a likely differential diagnosis in 23.3% and errors in the rest. ChatGPT 3.5 exhibited superior performance, with 86.6% accuracy, and suggested a reasonable differential diagnosis in 13.3%, without mistakes. In rare cases, Google did not provide correct diagnoses but offered a likely differential diagnosis in 20%. ChatGPT 3.5 achieved 16.6% accuracy, with 50% differential diagnoses. Conclusion: ChatGPT 3.5 demonstrated higher diagnostic accuracy than Google in both contexts. The platform showed acceptable accuracy in common cases; however, limitations in rare cases remained evident.\",\"PeriodicalId\":478577,\"journal\":{\"name\":\"medRxiv (Cold Spring Harbor Laboratory)\",\"volume\":\"4 10\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"medRxiv (Cold Spring Harbor Laboratory)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2023.11.10.23294668\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv (Cold Spring Harbor Laboratory)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2023.11.10.23294668","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Diagnosis in Bytes: Comparing the Diagnostic Accuracy of Google and ChatGPT 3.5 as Diagnostic Support Tools
Objective: Adopting digital technologies as diagnostic support tools in medicine is unquestionable. However, the accuracy in suggesting diagnoses remains controversial and underexplored. We aimed to evaluate and compare the diagnostic accuracy of two primary and accessible internet search tools: Google and ChatGPT 3.5. Method: We used 60 clinical cases related to urological pathologies to evaluate both platforms. These cases were divided into two groups: one with common conditions (constructed from the most frequent symptoms, following EAU and UpToDate guidelines) and another with rare disorders - based on case reports published between 2022 and 2023 in Urology Case Reports. Each case was inputted into Google Search and ChatGPT 3.5, and the results were categorized as "correct diagnosis," "likely differential diagnosis," or "incorrect diagnosis." A team of researchers evaluated the responses blindly and randomly. Results: In typical cases, Google achieved 53.3% accuracy, offering a likely differential diagnosis in 23.3% and errors in the rest. ChatGPT 3.5 exhibited superior performance, with 86.6% accuracy, and suggested a reasonable differential diagnosis in 13.3%, without mistakes. In rare cases, Google did not provide correct diagnoses but offered a likely differential diagnosis in 20%. ChatGPT 3.5 achieved 16.6% accuracy, with 50% differential diagnoses. Conclusion: ChatGPT 3.5 demonstrated higher diagnostic accuracy than Google in both contexts. The platform showed acceptable accuracy in common cases; however, limitations in rare cases remained evident.