[译文]探索人工智能在创伤学中的潜力：对具体问题的对话式回答。

Q3 Medicine

Revista Espanola de Cirugia Ortopedica y Traumatologia Pub Date : 2025-01-01 DOI:10.1016/j.recot.2024.11.005

F. Canillas del Rey , M. Canillas Arias

{"title":"[译文]探索人工智能在创伤学中的潜力：对具体问题的对话式回答。","authors":"F. Canillas del Rey , M. Canillas Arias","doi":"10.1016/j.recot.2024.11.005","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and objective</h3><div>Generative artificial intelligence is a technology that provides greater connectivity with people through conversational bots (“chatbots”). These bots can engage in dialogue using natural language indistinguishable from humans and are a potential source of information for patients. The aim of this study is to examine the performance of these bots in solving specific issues related to orthopedic surgery and traumatology using questions from the Spanish MIR exam between 2008 and 2023.</div></div><div><h3>Material and methods</h3><div>Three “chatbot” models (ChatGPT, Bard and Perplexity) were analyzed by answering 114 questions from the MIR. Their accuracy was compared, the readability of their responses was evaluated, and their dependence on logical reasoning and internal and external information was examined. The type of error was also evaluated in the failures.</div></div><div><h3>Results</h3><div>ChatGPT obtained 72.81% correct answers, followed by Perplexity (67.54%) and Bard (60.53%). Bard provides the most readable and comprehensive responses. The responses demonstrated logical reasoning and the use of internal information from the question prompts. In 16 questions (14%), all three applications failed simultaneously. Errors were identified, including logical and information failures.</div></div><div><h3>Conclusions</h3><div>While conversational bots can be useful in resolving medical questions, caution is advised due to the possibility of errors. Currently, they should be considered as a developing tool, and human opinion should prevail over generative artificial intelligence.</div></div>","PeriodicalId":39664,"journal":{"name":"Revista Espanola de Cirugia Ortopedica y Traumatologia","volume":"69 1","pages":"Pages T38-T46"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"[Translated article] Exploring the potential of artificial intelligence in traumatology: Conversational answers to specific questions\",\"authors\":\"F. Canillas del Rey , M. Canillas Arias\",\"doi\":\"10.1016/j.recot.2024.11.005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background and objective</h3><div>Generative artificial intelligence is a technology that provides greater connectivity with people through conversational bots (“chatbots”). These bots can engage in dialogue using natural language indistinguishable from humans and are a potential source of information for patients. The aim of this study is to examine the performance of these bots in solving specific issues related to orthopedic surgery and traumatology using questions from the Spanish MIR exam between 2008 and 2023.</div></div><div><h3>Material and methods</h3><div>Three “chatbot” models (ChatGPT, Bard and Perplexity) were analyzed by answering 114 questions from the MIR. Their accuracy was compared, the readability of their responses was evaluated, and their dependence on logical reasoning and internal and external information was examined. The type of error was also evaluated in the failures.</div></div><div><h3>Results</h3><div>ChatGPT obtained 72.81% correct answers, followed by Perplexity (67.54%) and Bard (60.53%). Bard provides the most readable and comprehensive responses. The responses demonstrated logical reasoning and the use of internal information from the question prompts. In 16 questions (14%), all three applications failed simultaneously. Errors were identified, including logical and information failures.</div></div><div><h3>Conclusions</h3><div>While conversational bots can be useful in resolving medical questions, caution is advised due to the possibility of errors. Currently, they should be considered as a developing tool, and human opinion should prevail over generative artificial intelligence.</div></div>\",\"PeriodicalId\":39664,\"journal\":{\"name\":\"Revista Espanola de Cirugia Ortopedica y Traumatologia\",\"volume\":\"69 1\",\"pages\":\"Pages T38-T46\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Revista Espanola de Cirugia Ortopedica y Traumatologia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1888441524001802\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Revista Espanola de Cirugia Ortopedica y Traumatologia","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1888441524001802","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

摘要

导言：生成式人工智能是一种通过对话机器人（"聊天机器人"）与人建立更紧密联系的技术。这些机器人可以使用与人类无异的自然语言进行对话，是患者的潜在信息来源。本研究的目的是利用 2008 年至 2023 年期间西班牙 MIR 考试中的问题，考察这些机器人在解决与骨科手术和创伤学相关的具体问题时的表现。材料和方法通过回答 MIR 中的 114 个问题，对三种 "聊天机器人 "模型（ChatGPT、Bard 和 Perplexity）进行了分析。比较了它们的准确性，评估了它们回答的可读性，并检查了它们对逻辑推理以及内部和外部信息的依赖性。此外，还对故障中的错误类型进行了评估。结果 ChatGPT 的正确率为 72.81%，其次是 Perplexity（67.54%）和 Bard（60.53%）。Bard 提供了最可读、最全面的答案。这些回答体现了逻辑推理和对问题提示内部信息的利用。在 16 个问题（14%）中，所有 3 个应用程序同时失败。发现的错误包括逻辑错误和信息错误。结论虽然对话机器人在解决医疗问题方面很有用，但由于可能出现错误，建议谨慎使用。目前，对话机器人应被视为一种发展中的工具，人的意见应优先于生成式人工智能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

[Translated article] Exploring the potential of artificial intelligence in traumatology: Conversational answers to specific questions

Background and objective

Generative artificial intelligence is a technology that provides greater connectivity with people through conversational bots (“chatbots”). These bots can engage in dialogue using natural language indistinguishable from humans and are a potential source of information for patients. The aim of this study is to examine the performance of these bots in solving specific issues related to orthopedic surgery and traumatology using questions from the Spanish MIR exam between 2008 and 2023.

Material and methods

Three “chatbot” models (ChatGPT, Bard and Perplexity) were analyzed by answering 114 questions from the MIR. Their accuracy was compared, the readability of their responses was evaluated, and their dependence on logical reasoning and internal and external information was examined. The type of error was also evaluated in the failures.

Results

ChatGPT obtained 72.81% correct answers, followed by Perplexity (67.54%) and Bard (60.53%). Bard provides the most readable and comprehensive responses. The responses demonstrated logical reasoning and the use of internal information from the question prompts. In 16 questions (14%), all three applications failed simultaneously. Errors were identified, including logical and information failures.

Conclusions

While conversational bots can be useful in resolving medical questions, caution is advised due to the possibility of errors. Currently, they should be considered as a developing tool, and human opinion should prevail over generative artificial intelligence.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Revista Espanola de Cirugia Ortopedica y Traumatologia Medicine-Surgery

CiteScore

1.10

自引率

0.00%

发文量

156

审稿时长

51 weeks

期刊介绍： Es una magnífica revista para acceder a los mejores artículos de investigación en la especialidad y los casos clínicos de mayor interés. Además, es la Publicación Oficial de la Sociedad, y está incluida en prestigiosos índices de referencia en medicina.