Yi Hu, Yongkang Lai, Foqiang Liao, Xu Shu, Yin Zhu, Yi-Qi Du, Nong-Hua Lu, National Clinical Research Center for Digestive Diseases (Shanghai)
{"title":"Assessing Accuracy of ChatGPT on Addressing Helicobacter pylori Infection-Related Questions: A National Survey and Comparative Study","authors":"Yi Hu, Yongkang Lai, Foqiang Liao, Xu Shu, Yin Zhu, Yi-Qi Du, Nong-Hua Lu, National Clinical Research Center for Digestive Diseases (Shanghai)","doi":"10.1111/hel.13116","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>ChatGPT is a novel and online large-scale language model used as a source providing up-to-date and useful health-related knowledges to patients and clinicians. However, its performance on <i>Helicobacter pylori</i> infection-related questions remain unknown. This study aimed to evaluate the accuracy of ChatGPT's responses on <i>H. pylori</i>-related questions compared with that of gastroenterologists during the same period.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Twenty-five <i>H. pylori</i>-related questions from five domains: Indication, Diagnostics, Treatment, Gastric cancer and prevention, and Gut Microbiota were selected based on the Maastricht VI Consensus report. Each question was tested three times with ChatGPT3.5 and ChatGPT4. Two independent <i>H. pylori</i> experts assessed the responses from ChatGPT, with discrepancies resolved by a third reviewer. Simultaneously, a nationwide survey with the same questions was conducted among 1279 gastroenterologists and 154 medical students. The accuracy of responses from ChatGPT3.5 and ChatGPT4 was compared with that of gastroenterologists.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Overall, both ChatGPT3.5 and ChatGPT4 demonstrated high accuracy, with median accuracy rates of 92% for each of the three responses, surpassing the accuracy of nationwide gastroenterologists (median: 80%) and equivalent to that of senior gastroenterologists. Compared with ChatGPT3.5, ChatGPT4 provided more concise responses with the same accuracy. ChatGPT3.5 performed well in the Indication, Treatment, and Gut Microbiota domains, whereas ChatGPT4 excelled in Diagnostics, Gastric cancer and prevention, and Gut Microbiota domains.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>ChatGPT exhibited high accuracy and reproducibility in addressing <i>H. pylori</i>-related questions except the decision for <i>H. pylori</i> treatment, performing at the level of senior gastroenterologists and could serve as an auxiliary information tool for assisting patients and clinicians.</p>\n </section>\n </div>","PeriodicalId":13223,"journal":{"name":"Helicobacter","volume":"29 4","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Helicobacter","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/hel.13116","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background
ChatGPT is a novel and online large-scale language model used as a source providing up-to-date and useful health-related knowledges to patients and clinicians. However, its performance on Helicobacter pylori infection-related questions remain unknown. This study aimed to evaluate the accuracy of ChatGPT's responses on H. pylori-related questions compared with that of gastroenterologists during the same period.
Methods
Twenty-five H. pylori-related questions from five domains: Indication, Diagnostics, Treatment, Gastric cancer and prevention, and Gut Microbiota were selected based on the Maastricht VI Consensus report. Each question was tested three times with ChatGPT3.5 and ChatGPT4. Two independent H. pylori experts assessed the responses from ChatGPT, with discrepancies resolved by a third reviewer. Simultaneously, a nationwide survey with the same questions was conducted among 1279 gastroenterologists and 154 medical students. The accuracy of responses from ChatGPT3.5 and ChatGPT4 was compared with that of gastroenterologists.
Results
Overall, both ChatGPT3.5 and ChatGPT4 demonstrated high accuracy, with median accuracy rates of 92% for each of the three responses, surpassing the accuracy of nationwide gastroenterologists (median: 80%) and equivalent to that of senior gastroenterologists. Compared with ChatGPT3.5, ChatGPT4 provided more concise responses with the same accuracy. ChatGPT3.5 performed well in the Indication, Treatment, and Gut Microbiota domains, whereas ChatGPT4 excelled in Diagnostics, Gastric cancer and prevention, and Gut Microbiota domains.
Conclusion
ChatGPT exhibited high accuracy and reproducibility in addressing H. pylori-related questions except the decision for H. pylori treatment, performing at the level of senior gastroenterologists and could serve as an auxiliary information tool for assisting patients and clinicians.
期刊介绍:
Helicobacter is edited by Professor David Y Graham. The editorial and peer review process is an independent process. Whenever there is a conflict of interest, the editor and editorial board will declare their interests and affiliations. Helicobacter recognises the critical role that has been established for Helicobacter pylori in peptic ulcer, gastric adenocarcinoma, and primary gastric lymphoma. As new helicobacter species are now regularly being discovered, Helicobacter covers the entire range of helicobacter research, increasing communication among the fields of gastroenterology; microbiology; vaccine development; laboratory animal science.