{"title":"比较临床问题答案的可用性和可靠性:人工智能生成的 ChatGPT 与人工撰写的资源的可用性和可靠性比较。","authors":"Farrin A Manian, Katherine Garland, Jimin Ding","doi":"10.14423/SMJ.0000000000001715","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Our aim was to compare the usability and reliability of answers to clinical questions posed of Chat-Generative Pre-Trained Transformer (ChatGPT) compared to those of a human-authored Web source (www.Pearls4Peers.com) in response to \"real-world\" clinical questions raised during the care of patients.</p><p><strong>Methods: </strong>Two domains of clinical information quality were studied: usability, based on organization/readability, relevance, and usefulness, and reliability, based on clarity, accuracy, and thoroughness. The top 36 most viewed real-world questions from a human-authored Web site (www.Pearls4Peers.com [P4P]) were posed to ChatGPT 3.5. Anonymized answers by ChatGPT and P4P (without literature citations) were separately assessed for usability by 18 practicing physicians (\"clinician users\") in triplicate and for reliability by 21 expert providers (\"content experts\") on a Likert scale (\"definitely yes,\" \"generally yes,\" or \"no\") in duplicate or triplicate. Participants also directly compared the usability and reliability of paired answers.</p><p><strong>Results: </strong>The usability and reliability of ChatGPT answers varied widely depending on the question posed. ChatGPT answers were not considered useful or accurate in 13.9% and 13.1% of cases, respectively. In within-individual rankings for usability, ChatGPT was inferior to P4P in organization/readability, relevance, and usefulness in 29.6%, 28.3%, and 29.6% of cases, respectively, and for reliability, inferior to P4P in clarity, accuracy, and thoroughness in 38.1%, 34.5%, and 31% of cases, respectively.</p><p><strong>Conclusions: </strong>The quality of ChatGPT responses to real-world clinical questions varied widely, with nearly one-third or more answers considered inferior to a human-authored source in several aspects of usability and reliability. Caution is advised when using ChatGPT in clinical decision making.</p>","PeriodicalId":22043,"journal":{"name":"Southern Medical Journal","volume":"117 8","pages":"467-473"},"PeriodicalIF":1.0000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of the Usability and Reliability of Answers to Clinical Questions: AI-Generated ChatGPT versus a Human-Authored Resource.\",\"authors\":\"Farrin A Manian, Katherine Garland, Jimin Ding\",\"doi\":\"10.14423/SMJ.0000000000001715\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objectives: </strong>Our aim was to compare the usability and reliability of answers to clinical questions posed of Chat-Generative Pre-Trained Transformer (ChatGPT) compared to those of a human-authored Web source (www.Pearls4Peers.com) in response to \\\"real-world\\\" clinical questions raised during the care of patients.</p><p><strong>Methods: </strong>Two domains of clinical information quality were studied: usability, based on organization/readability, relevance, and usefulness, and reliability, based on clarity, accuracy, and thoroughness. The top 36 most viewed real-world questions from a human-authored Web site (www.Pearls4Peers.com [P4P]) were posed to ChatGPT 3.5. Anonymized answers by ChatGPT and P4P (without literature citations) were separately assessed for usability by 18 practicing physicians (\\\"clinician users\\\") in triplicate and for reliability by 21 expert providers (\\\"content experts\\\") on a Likert scale (\\\"definitely yes,\\\" \\\"generally yes,\\\" or \\\"no\\\") in duplicate or triplicate. Participants also directly compared the usability and reliability of paired answers.</p><p><strong>Results: </strong>The usability and reliability of ChatGPT answers varied widely depending on the question posed. ChatGPT answers were not considered useful or accurate in 13.9% and 13.1% of cases, respectively. In within-individual rankings for usability, ChatGPT was inferior to P4P in organization/readability, relevance, and usefulness in 29.6%, 28.3%, and 29.6% of cases, respectively, and for reliability, inferior to P4P in clarity, accuracy, and thoroughness in 38.1%, 34.5%, and 31% of cases, respectively.</p><p><strong>Conclusions: </strong>The quality of ChatGPT responses to real-world clinical questions varied widely, with nearly one-third or more answers considered inferior to a human-authored source in several aspects of usability and reliability. Caution is advised when using ChatGPT in clinical decision making.</p>\",\"PeriodicalId\":22043,\"journal\":{\"name\":\"Southern Medical Journal\",\"volume\":\"117 8\",\"pages\":\"467-473\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2024-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Southern Medical Journal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.14423/SMJ.0000000000001715\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Southern Medical Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.14423/SMJ.0000000000001715","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
Comparison of the Usability and Reliability of Answers to Clinical Questions: AI-Generated ChatGPT versus a Human-Authored Resource.
Objectives: Our aim was to compare the usability and reliability of answers to clinical questions posed of Chat-Generative Pre-Trained Transformer (ChatGPT) compared to those of a human-authored Web source (www.Pearls4Peers.com) in response to "real-world" clinical questions raised during the care of patients.
Methods: Two domains of clinical information quality were studied: usability, based on organization/readability, relevance, and usefulness, and reliability, based on clarity, accuracy, and thoroughness. The top 36 most viewed real-world questions from a human-authored Web site (www.Pearls4Peers.com [P4P]) were posed to ChatGPT 3.5. Anonymized answers by ChatGPT and P4P (without literature citations) were separately assessed for usability by 18 practicing physicians ("clinician users") in triplicate and for reliability by 21 expert providers ("content experts") on a Likert scale ("definitely yes," "generally yes," or "no") in duplicate or triplicate. Participants also directly compared the usability and reliability of paired answers.
Results: The usability and reliability of ChatGPT answers varied widely depending on the question posed. ChatGPT answers were not considered useful or accurate in 13.9% and 13.1% of cases, respectively. In within-individual rankings for usability, ChatGPT was inferior to P4P in organization/readability, relevance, and usefulness in 29.6%, 28.3%, and 29.6% of cases, respectively, and for reliability, inferior to P4P in clarity, accuracy, and thoroughness in 38.1%, 34.5%, and 31% of cases, respectively.
Conclusions: The quality of ChatGPT responses to real-world clinical questions varied widely, with nearly one-third or more answers considered inferior to a human-authored source in several aspects of usability and reliability. Caution is advised when using ChatGPT in clinical decision making.
期刊介绍:
As the official journal of the Birmingham, Alabama-based Southern Medical Association (SMA), the Southern Medical Journal (SMJ) has for more than 100 years provided the latest clinical information in areas that affect patients'' daily lives. Now delivered to individuals exclusively online, the SMJ has a multidisciplinary focus that covers a broad range of topics relevant to physicians and other healthcare specialists in all relevant aspects of the profession, including medicine and medical specialties, surgery and surgery specialties; child and maternal health; mental health; emergency and disaster medicine; public health and environmental medicine; bioethics and medical education; and quality health care, patient safety, and best practices. Each month, articles span the spectrum of medical topics, providing timely, up-to-the-minute information for both primary care physicians and specialists. Contributors include leaders in the healthcare field from across the country and around the world. The SMJ enables physicians to provide the best possible care to patients in this age of rapidly changing modern medicine.