Can artificial intelligence replace biochemists? A study comparing interpretation of thyroid function test results by ChatGPT and Google Bard to practising biochemists.

IF 2.1 4区 医学 Q3 MEDICAL LABORATORY TECHNOLOGY
Annals of Clinical Biochemistry Pub Date : 2024-03-01 Epub Date: 2023-09-20 DOI:10.1177/00045632231203473
Emma Stevenson, Chelsey Walsh, Luke Hibberd
{"title":"Can artificial intelligence replace biochemists? A study comparing interpretation of thyroid function test results by ChatGPT and Google Bard to practising biochemists.","authors":"Emma Stevenson, Chelsey Walsh, Luke Hibberd","doi":"10.1177/00045632231203473","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Public awareness of artificial intelligence (AI) is increasing and this novel technology is being used for a range of everyday tasks and more specialist clinical applications. On a background of increasing waits for GP appointments alongside patient access to laboratory test results through the NHS app, this study aimed to assess the accuracy and safety of two AI tools, ChatGPT and Google Bard, in providing interpretation of thyroid function test results as if posed by laboratory scientists or patients.</p><p><strong>Methods: </strong>Fifteen fictional cases were presented to a team of clinicians and clinical scientists to produce a consensus opinion. The cases were then presented to ChatGPT and Google Bard as though from healthcare providers and from patients. The responses were categorized as correct, partially correct or incorrect compared to consensus opinion and the advice assessed for safety to patients.</p><p><strong>Results: </strong>Of the 15 cases presented, ChatGPT and Google Bard correctly interpreted only 33.3% and 20.0% of cases, respectively. When queries were posed as a patient, 66.7% of ChatGPT responses were safe compared to 60.0% of Google Bard responses. Both AI tools were able to identify primary hypothyroidism and hyperthyroidism but failed to identify subclinical presentations, non-thyroidal illness or secondary hypothyroidism.</p><p><strong>Conclusions: </strong>This study has demonstrated that AI tools do not currently have the capacity to generate consistently correct interpretation and safe advice to patients and should not be used as an alternative to a consultation with a qualified medical professional. Available AI in its current form cannot replace human clinical knowledge in this scenario.</p>","PeriodicalId":8005,"journal":{"name":"Annals of Clinical Biochemistry","volume":" ","pages":"143-149"},"PeriodicalIF":2.1000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Clinical Biochemistry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/00045632231203473","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/9/20 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"MEDICAL LABORATORY TECHNOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Public awareness of artificial intelligence (AI) is increasing and this novel technology is being used for a range of everyday tasks and more specialist clinical applications. On a background of increasing waits for GP appointments alongside patient access to laboratory test results through the NHS app, this study aimed to assess the accuracy and safety of two AI tools, ChatGPT and Google Bard, in providing interpretation of thyroid function test results as if posed by laboratory scientists or patients.

Methods: Fifteen fictional cases were presented to a team of clinicians and clinical scientists to produce a consensus opinion. The cases were then presented to ChatGPT and Google Bard as though from healthcare providers and from patients. The responses were categorized as correct, partially correct or incorrect compared to consensus opinion and the advice assessed for safety to patients.

Results: Of the 15 cases presented, ChatGPT and Google Bard correctly interpreted only 33.3% and 20.0% of cases, respectively. When queries were posed as a patient, 66.7% of ChatGPT responses were safe compared to 60.0% of Google Bard responses. Both AI tools were able to identify primary hypothyroidism and hyperthyroidism but failed to identify subclinical presentations, non-thyroidal illness or secondary hypothyroidism.

Conclusions: This study has demonstrated that AI tools do not currently have the capacity to generate consistently correct interpretation and safe advice to patients and should not be used as an alternative to a consultation with a qualified medical professional. Available AI in its current form cannot replace human clinical knowledge in this scenario.

人工智能能取代生物化学家吗?一项将ChatGPT和Google Bard对甲状腺功能测试结果的解释与执业生物化学家进行比较的研究。
背景:公众对人工智能的认识正在提高,这项新技术正被用于一系列日常任务和更多的专业临床应用。在等待全科医生预约以及患者通过NHS应用程序获取实验室检测结果的情况下,这项研究旨在评估两种人工智能工具ChatGPT和Google Bard在解释甲状腺功能检测结果方面的准确性和安全性,就像实验室科学家或患者提出的那样。方法:将15个虚构的病例提交给临床医生和临床科学家团队,以达成一致意见。然后,这些病例被提交给ChatGPT和Google Bard,就像来自医疗保健提供者和患者一样。与一致意见和评估患者安全性的建议相比,反应被分为正确、部分正确或不正确。结果:在15例病例中,ChatGPT和Google Bard分别仅正确解释了33.3%和20.0%的病例。当以患者身份提出询问时,66.7%的ChatGPT回复是安全的,而Google Bard的回复是60.0%。两种人工智能工具都能够识别原发性甲状腺功能减退症和甲状腺功能亢进症,但未能识别亚临床表现、非甲状腺疾病或继发性甲状腺功能低下症。结论:这项研究表明,人工智能工具目前不具备为患者提供一致正确解释和安全建议的能力,不应被用作咨询合格医疗专业人员的替代方案。在这种情况下,现有形式的人工智能无法取代人类的临床知识。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Annals of Clinical Biochemistry
Annals of Clinical Biochemistry Biochemistry, Genetics and Molecular Biology-Clinical Biochemistry
CiteScore
5.20
自引率
4.50%
发文量
61
期刊介绍: Annals of Clinical Biochemistry is the fully peer reviewed international journal of the Association for Clinical Biochemistry and Laboratory Medicine. Annals of Clinical Biochemistry accepts papers that contribute to knowledge in all fields of laboratory medicine, especially those pertaining to the understanding, diagnosis and treatment of human disease. It publishes papers on clinical biochemistry, clinical audit, metabolic medicine, immunology, genetics, biotechnology, haematology, microbiology, computing and management where they have both biochemical and clinical relevance. Papers describing evaluation or implementation of commercial reagent kits or the performance of new analysers require substantial original information. Unless of exceptional interest and novelty, studies dealing with the redox status in various diseases are not generally considered within the journal''s scope. Studies documenting the association of single nucleotide polymorphisms (SNPs) with particular phenotypes will not normally be considered, given the greater strength of genome wide association studies (GWAS). Research undertaken in non-human animals will not be considered for publication in the Annals. Annals of Clinical Biochemistry is also the official journal of NVKC (de Nederlandse Vereniging voor Klinische Chemie) and JSCC (Japan Society of Clinical Chemistry).
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信