人工智能用于泌尿学研究：数据科学的圣杯还是错误信息的潘多拉魔盒？

IF 2.9 2区医学 Q1 UROLOGY & NEPHROLOGY

Journal of endourology Pub Date : 2024-08-01 Epub Date: 2024-05-15 DOI:10.1089/end.2023.0703

Ryan M Blake, Johnathan A Khusid

{"title":"人工智能用于泌尿学研究：数据科学的圣杯还是错误信息的潘多拉魔盒？","authors":"Ryan M Blake, Johnathan A Khusid","doi":"10.1089/end.2023.0703","DOIUrl":null,"url":null,"abstract":"Introduction: Artificial intelligence tools such as the large language models (LLMs) Bard and ChatGPT have generated significant research interest. Utilization of these LLMs to study the epidemiology of a target population could benefit urologists. We investigated whether Bard and ChatGPT can perform a large-scale calculation of the incidence and prevalence of kidney stone disease. Materials and Methods: We obtained reference values from two published studies, which used the National Health and Nutrition Examination Survey (NHANES) database to calculate the prevalence and incidence of kidney stone disease. We then tested the capability of Bard and ChatGPT to perform similar calculations using two different methods. First, we instructed the LLMs to access the data sets and independently perform the calculation. Second, we instructed the interfaces to generate a customized computer code, which could perform the calculation on downloaded data sets. Results: While ChatGPT denied the ability to access and perform calculations on the NHANES database, Bard intermittently claimed the ability to do so. Bard provided either accurate results or inaccurate and inconsistent results. For example, Bard's \"calculations\" for the incidence of kidney stones from 2015 to 2018 were 2.1% (95% CI 1.5-2.7), 1.75% (95% CI 1.6-1.9), and 0.8% (95% CI 0.7-0.9), while the published number was 2.1% (95% CI 1.5-2.7). Bard provided discrete mathematical details of its calculations, however, when prompted further, admitted to having obtained the numbers from online sources, including our chosen reference articles, rather than from a de novo calculation. Both LLMs were able to produce a code (Python) to use on the downloaded NHANES data sets, however, these would not readily execute. Conclusions: ChatGPT and Bard are currently incapable of performing epidemiologic calculations and lack transparency and accountability. Caution should be used, particularly with Bard, as claims of its capabilities were convincingly misleading, and results were inconsistent.","PeriodicalId":15723,"journal":{"name":"Journal of endourology","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Artificial Intelligence for Urology Research: The Holy Grail of Data Science or Pandora's Box of Misinformation?\",\"authors\":\"Ryan M Blake, Johnathan A Khusid\",\"doi\":\"10.1089/end.2023.0703\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction: Artificial intelligence tools such as the large language models (LLMs) Bard and ChatGPT have generated significant research interest. Utilization of these LLMs to study the epidemiology of a target population could benefit urologists. We investigated whether Bard and ChatGPT can perform a large-scale calculation of the incidence and prevalence of kidney stone disease. Materials and Methods: We obtained reference values from two published studies, which used the National Health and Nutrition Examination Survey (NHANES) database to calculate the prevalence and incidence of kidney stone disease. We then tested the capability of Bard and ChatGPT to perform similar calculations using two different methods. First, we instructed the LLMs to access the data sets and independently perform the calculation. Second, we instructed the interfaces to generate a customized computer code, which could perform the calculation on downloaded data sets. Results: While ChatGPT denied the ability to access and perform calculations on the NHANES database, Bard intermittently claimed the ability to do so. Bard provided either accurate results or inaccurate and inconsistent results. For example, Bard's \\\"calculations\\\" for the incidence of kidney stones from 2015 to 2018 were 2.1% (95% CI 1.5-2.7), 1.75% (95% CI 1.6-1.9), and 0.8% (95% CI 0.7-0.9), while the published number was 2.1% (95% CI 1.5-2.7). Bard provided discrete mathematical details of its calculations, however, when prompted further, admitted to having obtained the numbers from online sources, including our chosen reference articles, rather than from a de novo calculation. Both LLMs were able to produce a code (Python) to use on the downloaded NHANES data sets, however, these would not readily execute. Conclusions: ChatGPT and Bard are currently incapable of performing epidemiologic calculations and lack transparency and accountability. Caution should be used, particularly with Bard, as claims of its capabilities were convincingly misleading, and results were inconsistent.\",\"PeriodicalId\":15723,\"journal\":{\"name\":\"Journal of endourology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of endourology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1089/end.2023.0703\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/5/15 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"UROLOGY & NEPHROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of endourology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1089/end.2023.0703","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/5/15 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}

引用次数: 0

摘要

引言大型语言模型（LLMs）Bard 和 ChatGPT 等人工智能工具引起了人们极大的研究兴趣。利用这些大型语言模型研究目标人群的流行病学可使泌尿科医生受益匪浅。我们研究了 Bard 和 ChatGPT 能否大规模计算肾结石病的发病率和流行率。材料与方法我们从两项已发表的研究中获得了参考值，这两项研究使用了美国国家健康与营养调查（NHANES）数据库来计算肾结石病的患病率和发病率。然后，我们用两种不同的方法测试了 Bard 和 ChatGPT 进行类似计算的能力。首先，我们指示 LLMs 访问数据集并独立进行计算。其次，我们指示界面生成定制的计算机代码，以便在下载的数据集上执行计算。结果虽然 ChatGPT 否认自己有能力访问 NHANES 数据库并进行计算，但 Bard 却断断续续地声称自己有能力这样做。Bard 要么提供了准确的结果，要么提供了不准确和不一致的结果。例如，巴德公司对 2015-2018 年肾结石发病率的 "计算结果 "分别为 2.1%（95% CI：1.5-2.7）、1.75%（95% CI：1.6-1.9）和 0.8%（95% CI 0.7-0.9），而公布的数字为 2.1%（95% CI 1.5-2.7）。Bard 提供了其计算的离散数学细节，但在进一步询问时，Bard 承认这些数字是从网上来源获得的，包括我们选择的参考文献，而不是重新计算的结果。两位 LLM 都能在下载的 NHANES 数据集上生成代码（Python），但这些代码并不容易执行。结论 ChatGPT 和 Bard 目前无法进行流行病学计算，缺乏透明度和问责制。应谨慎使用 ChatGPT 和 Bard，尤其是 Bard，因为对其功能的宣称具有令人信服的误导性，而且结果也不一致。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Artificial Intelligence for Urology Research: The Holy Grail of Data Science or Pandora's Box of Misinformation?

Introduction: Artificial intelligence tools such as the large language models (LLMs) Bard and ChatGPT have generated significant research interest. Utilization of these LLMs to study the epidemiology of a target population could benefit urologists. We investigated whether Bard and ChatGPT can perform a large-scale calculation of the incidence and prevalence of kidney stone disease. Materials and Methods: We obtained reference values from two published studies, which used the National Health and Nutrition Examination Survey (NHANES) database to calculate the prevalence and incidence of kidney stone disease. We then tested the capability of Bard and ChatGPT to perform similar calculations using two different methods. First, we instructed the LLMs to access the data sets and independently perform the calculation. Second, we instructed the interfaces to generate a customized computer code, which could perform the calculation on downloaded data sets. Results: While ChatGPT denied the ability to access and perform calculations on the NHANES database, Bard intermittently claimed the ability to do so. Bard provided either accurate results or inaccurate and inconsistent results. For example, Bard's "calculations" for the incidence of kidney stones from 2015 to 2018 were 2.1% (95% CI 1.5-2.7), 1.75% (95% CI 1.6-1.9), and 0.8% (95% CI 0.7-0.9), while the published number was 2.1% (95% CI 1.5-2.7). Bard provided discrete mathematical details of its calculations, however, when prompted further, admitted to having obtained the numbers from online sources, including our chosen reference articles, rather than from a de novo calculation. Both LLMs were able to produce a code (Python) to use on the downloaded NHANES data sets, however, these would not readily execute. Conclusions: ChatGPT and Bard are currently incapable of performing epidemiologic calculations and lack transparency and accountability. Caution should be used, particularly with Bard, as claims of its capabilities were convincingly misleading, and results were inconsistent.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of endourology 医学-泌尿学与肾脏学

CiteScore

5.50

自引率

14.80%

发文量

254

审稿时长

1 months

期刊介绍： Journal of Endourology, JE Case Reports, and Videourology are the leading peer-reviewed journal, case reports publication, and innovative videojournal companion covering all aspects of minimally invasive urology research, applications, and clinical outcomes. The leading journal of minimally invasive urology for over 30 years, Journal of Endourology is the essential publication for practicing surgeons who want to keep up with the latest surgical technologies in endoscopic, laparoscopic, robotic, and image-guided procedures as they apply to benign and malignant diseases of the genitourinary tract. This flagship journal includes the companion videojournal Videourology™ with every subscription. While Journal of Endourology remains focused on publishing rigorously peer reviewed articles, Videourology accepts original videos containing material that has not been reported elsewhere, except in the form of an abstract or a conference presentation. Journal of Endourology coverage includes: The latest laparoscopic, robotic, endoscopic, and image-guided techniques for treating both benign and malignant conditions Pioneering research articles Controversial cases in endourology Techniques in endourology with accompanying videos Reviews and epochs in endourology Endourology survey section of endourology relevant manuscripts published in other journals.