Tim Reason, Emma Benbow, Julia Langham, Andy Gimblett, Sven L Klijn, Bill Malcolm
{"title":"Artificial Intelligence to Automate Network Meta-Analyses: Four Case Studies to Evaluate the Potential Application of Large Language Models.","authors":"Tim Reason, Emma Benbow, Julia Langham, Andy Gimblett, Sven L Klijn, Bill Malcolm","doi":"10.1007/s41669-024-00476-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The emergence of artificial intelligence, capable of human-level performance on some tasks, presents an opportunity to revolutionise development of systematic reviews and network meta-analyses (NMAs). In this pilot study, we aim to assess use of a large-language model (LLM, Generative Pre-trained Transformer 4 [GPT-4]) to automatically extract data from publications, write an R script to conduct an NMA and interpret the results.</p><p><strong>Methods: </strong>We considered four case studies involving binary and time-to-event outcomes in two disease areas, for which an NMA had previously been conducted manually. For each case study, a Python script was developed that communicated with the LLM via application programming interface (API) calls. The LLM was prompted to extract relevant data from publications, to create an R script to be used to run the NMA and then to produce a small report describing the analysis.</p><p><strong>Results: </strong>The LLM had a > 99% success rate of accurately extracting data across 20 runs for each case study and could generate R scripts that could be run end-to-end without human input. It also produced good quality reports describing the disease area, analysis conducted, results obtained and a correct interpretation of the results.</p><p><strong>Conclusions: </strong>This study provides a promising indication of the feasibility of using current generation LLMs to automate data extraction, code generation and NMA result interpretation, which could result in significant time savings and reduce human error. This is provided that routine technical checks are performed, as recommend for human-conducted analyses. Whilst not currently 100% consistent, LLMs are likely to improve with time.</p>","PeriodicalId":19770,"journal":{"name":"PharmacoEconomics Open","volume":" ","pages":"205-220"},"PeriodicalIF":2.0000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10884375/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PharmacoEconomics Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41669-024-00476-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/2/10 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The emergence of artificial intelligence, capable of human-level performance on some tasks, presents an opportunity to revolutionise development of systematic reviews and network meta-analyses (NMAs). In this pilot study, we aim to assess use of a large-language model (LLM, Generative Pre-trained Transformer 4 [GPT-4]) to automatically extract data from publications, write an R script to conduct an NMA and interpret the results.
Methods: We considered four case studies involving binary and time-to-event outcomes in two disease areas, for which an NMA had previously been conducted manually. For each case study, a Python script was developed that communicated with the LLM via application programming interface (API) calls. The LLM was prompted to extract relevant data from publications, to create an R script to be used to run the NMA and then to produce a small report describing the analysis.
Results: The LLM had a > 99% success rate of accurately extracting data across 20 runs for each case study and could generate R scripts that could be run end-to-end without human input. It also produced good quality reports describing the disease area, analysis conducted, results obtained and a correct interpretation of the results.
Conclusions: This study provides a promising indication of the feasibility of using current generation LLMs to automate data extraction, code generation and NMA result interpretation, which could result in significant time savings and reduce human error. This is provided that routine technical checks are performed, as recommend for human-conducted analyses. Whilst not currently 100% consistent, LLMs are likely to improve with time.
期刊介绍:
PharmacoEconomics - Open focuses on applied research on the economic implications and health outcomes associated with drugs, devices and other healthcare interventions. The journal includes, but is not limited to, the following research areas:Economic analysis of healthcare interventionsHealth outcomes researchCost-of-illness studiesQuality-of-life studiesAdditional digital features (including animated abstracts, video abstracts, slide decks, audio slides, instructional videos, infographics, podcasts and animations) can be published with articles; these are designed to increase the visibility, readership and educational value of the journal’s content. In addition, articles published in PharmacoEconomics -Open may be accompanied by plain language summaries to assist readers who have some knowledge of, but not in-depth expertise in, the area to understand important medical advances.All manuscripts are subject to peer review by international experts. Letters to the Editor are welcomed and will be considered for publication.