利用 ChatGPT 作为科学推理引擎，区分相互矛盾的证据，总结有争议的临床问题所面临的挑战。

Journal of the American Medical Informatics Association : JAMIA Pub Date : 2024-05-17 DOI:10.1093/jamia/ocae100

Shiyao Xie, Wenjing Zhao, Guanghui Deng, Guohua He, Na He, Zhenhua Lu, Weihua Hu, Mingming Zhao, Jian Du

{"title":"利用 ChatGPT 作为科学推理引擎，区分相互矛盾的证据，总结有争议的临床问题所面临的挑战。","authors":"Shiyao Xie, Wenjing Zhao, Guanghui Deng, Guohua He, Na He, Zhenhua Lu, Weihua Hu, Mingming Zhao, Jian Du","doi":"10.1093/jamia/ocae100","DOIUrl":null,"url":null,"abstract":"OBJECTIVE\nSynthesizing and evaluating inconsistent medical evidence is essential in evidence-based medicine. This study aimed to employ ChatGPT as a sophisticated scientific reasoning engine to identify conflicting clinical evidence and summarize unresolved questions to inform further research.\n\n\nMATERIALS AND METHODS\nWe evaluated ChatGPT's effectiveness in identifying conflicting evidence and investigated its principles of logical reasoning. An automated framework was developed to generate a PubMed dataset focused on controversial clinical topics. ChatGPT analyzed this dataset to identify consensus and controversy, and to formulate unsolved research questions. Expert evaluations were conducted 1) on the consensus and controversy for factual consistency, comprehensiveness, and potential harm and, 2) on the research questions for relevance, innovation, clarity, and specificity.\n\n\nRESULTS\nThe gpt-4-1106-preview model achieved a 90% recall rate in detecting inconsistent claim pairs within a ternary assertions setup. Notably, without explicit reasoning prompts, ChatGPT provided sound reasoning for the assertions between claims and hypotheses, based on an analysis grounded in relevance, specificity, and certainty. ChatGPT's conclusions of consensus and controversies in clinical literature were comprehensive and factually consistent. The research questions proposed by ChatGPT received high expert ratings.\n\n\nDISCUSSION\nOur experiment implies that, in evaluating the relationship between evidence and claims, ChatGPT considered more detailed information beyond a straightforward assessment of sentimental orientation. This ability to process intricate information and conduct scientific reasoning regarding sentiment is noteworthy, particularly as this pattern emerged without explicit guidance or directives in prompts, highlighting ChatGPT's inherent logical reasoning capabilities.\n\n\nCONCLUSION\nThis study demonstrated ChatGPT's capacity to evaluate and interpret scientific claims. Such proficiency can be generalized to broader clinical research literature. ChatGPT effectively aids in facilitating clinical studies by proposing unresolved challenges based on analysis of existing studies. However, caution is advised as ChatGPT's outputs are inferences drawn from the input literature and could be harmful to clinical practice.","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"9 41","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Utilizing ChatGPT as a scientific reasoning engine to differentiate conflicting evidence and summarize challenges in controversial clinical questions.\",\"authors\":\"Shiyao Xie, Wenjing Zhao, Guanghui Deng, Guohua He, Na He, Zhenhua Lu, Weihua Hu, Mingming Zhao, Jian Du\",\"doi\":\"10.1093/jamia/ocae100\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"OBJECTIVE\\nSynthesizing and evaluating inconsistent medical evidence is essential in evidence-based medicine. This study aimed to employ ChatGPT as a sophisticated scientific reasoning engine to identify conflicting clinical evidence and summarize unresolved questions to inform further research.\\n\\n\\nMATERIALS AND METHODS\\nWe evaluated ChatGPT's effectiveness in identifying conflicting evidence and investigated its principles of logical reasoning. An automated framework was developed to generate a PubMed dataset focused on controversial clinical topics. ChatGPT analyzed this dataset to identify consensus and controversy, and to formulate unsolved research questions. Expert evaluations were conducted 1) on the consensus and controversy for factual consistency, comprehensiveness, and potential harm and, 2) on the research questions for relevance, innovation, clarity, and specificity.\\n\\n\\nRESULTS\\nThe gpt-4-1106-preview model achieved a 90% recall rate in detecting inconsistent claim pairs within a ternary assertions setup. Notably, without explicit reasoning prompts, ChatGPT provided sound reasoning for the assertions between claims and hypotheses, based on an analysis grounded in relevance, specificity, and certainty. ChatGPT's conclusions of consensus and controversies in clinical literature were comprehensive and factually consistent. The research questions proposed by ChatGPT received high expert ratings.\\n\\n\\nDISCUSSION\\nOur experiment implies that, in evaluating the relationship between evidence and claims, ChatGPT considered more detailed information beyond a straightforward assessment of sentimental orientation. This ability to process intricate information and conduct scientific reasoning regarding sentiment is noteworthy, particularly as this pattern emerged without explicit guidance or directives in prompts, highlighting ChatGPT's inherent logical reasoning capabilities.\\n\\n\\nCONCLUSION\\nThis study demonstrated ChatGPT's capacity to evaluate and interpret scientific claims. Such proficiency can be generalized to broader clinical research literature. ChatGPT effectively aids in facilitating clinical studies by proposing unresolved challenges based on analysis of existing studies. However, caution is advised as ChatGPT's outputs are inferences drawn from the input literature and could be harmful to clinical practice.\",\"PeriodicalId\":236137,\"journal\":{\"name\":\"Journal of the American Medical Informatics Association : JAMIA\",\"volume\":\"9 41\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the American Medical Informatics Association : JAMIA\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/jamia/ocae100\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association : JAMIA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamia/ocae100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

目的合成和评估不一致的医学证据对于循证医学至关重要。本研究旨在使用 ChatGPT 作为一个复杂的科学推理引擎，来识别相互矛盾的临床证据，并总结未解决的问题，为进一步的研究提供信息。我们开发了一个自动框架来生成一个 PubMed 数据集，重点关注有争议的临床话题。ChatGPT 通过分析该数据集来识别共识和争议，并提出尚未解决的研究问题。专家评估的内容包括：1）共识和争议的事实一致性、全面性和潜在危害；2）研究问题的相关性、创新性、清晰性和具体性。结果gpt-4-1106-preview 模型在检测三元断言设置中不一致的断言对方面达到了 90% 的召回率。值得注意的是，在没有明确推理提示的情况下，ChatGPT 基于相关性、具体性和确定性的分析，为主张和假设之间的断言提供了合理的推理。ChatGPT 对临床文献中的共识和争议得出的结论是全面的，与事实相符。我们的实验表明，在评估证据和主张之间的关系时，ChatGPT 除了直接评估情感取向外，还考虑了更多详细信息。这种处理复杂信息并对情感进行科学推理的能力值得注意，尤其是这种模式是在没有明确指导或提示的情况下出现的，突出了 ChatGPT 固有的逻辑推理能力。这种能力可以推广到更广泛的临床研究文献中。ChatGPT 可以在分析现有研究的基础上提出尚未解决的难题，从而有效促进临床研究。不过，由于 ChatGPT 的输出结果是从输入文献中得出的推论，可能对临床实践有害，因此建议谨慎使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Utilizing ChatGPT as a scientific reasoning engine to differentiate conflicting evidence and summarize challenges in controversial clinical questions.

OBJECTIVE Synthesizing and evaluating inconsistent medical evidence is essential in evidence-based medicine. This study aimed to employ ChatGPT as a sophisticated scientific reasoning engine to identify conflicting clinical evidence and summarize unresolved questions to inform further research. MATERIALS AND METHODS We evaluated ChatGPT's effectiveness in identifying conflicting evidence and investigated its principles of logical reasoning. An automated framework was developed to generate a PubMed dataset focused on controversial clinical topics. ChatGPT analyzed this dataset to identify consensus and controversy, and to formulate unsolved research questions. Expert evaluations were conducted 1) on the consensus and controversy for factual consistency, comprehensiveness, and potential harm and, 2) on the research questions for relevance, innovation, clarity, and specificity. RESULTS The gpt-4-1106-preview model achieved a 90% recall rate in detecting inconsistent claim pairs within a ternary assertions setup. Notably, without explicit reasoning prompts, ChatGPT provided sound reasoning for the assertions between claims and hypotheses, based on an analysis grounded in relevance, specificity, and certainty. ChatGPT's conclusions of consensus and controversies in clinical literature were comprehensive and factually consistent. The research questions proposed by ChatGPT received high expert ratings. DISCUSSION Our experiment implies that, in evaluating the relationship between evidence and claims, ChatGPT considered more detailed information beyond a straightforward assessment of sentimental orientation. This ability to process intricate information and conduct scientific reasoning regarding sentiment is noteworthy, particularly as this pattern emerged without explicit guidance or directives in prompts, highlighting ChatGPT's inherent logical reasoning capabilities. CONCLUSION This study demonstrated ChatGPT's capacity to evaluate and interpret scientific claims. Such proficiency can be generalized to broader clinical research literature. ChatGPT effectively aids in facilitating clinical studies by proposing unresolved challenges based on analysis of existing studies. However, caution is advised as ChatGPT's outputs are inferences drawn from the input literature and could be harmful to clinical practice.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the American Medical Informatics Association : JAMIA

自引率

0.00%

发文量