随机对照试验中使用大型语言模型（Elicit）和人工审稿人的数据提取：系统比较

Cochrane Evidence Synthesis and Methods Pub Date : 2025-06-08 DOI:10.1002/cesm.70033

Joleen Bianchi, Julian Hirt, Magdalena Vogt, Janine Vetsch

{"title":"随机对照试验中使用大型语言模型（Elicit）和人工审稿人的数据提取：系统比较","authors":"Joleen Bianchi, Julian Hirt, Magdalena Vogt, Janine Vetsch","doi":"10.1002/cesm.70033","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Aim</h3>\n \n <p>We aimed at comparing data extractions from randomized controlled trials by using Elicit and human reviewers.</p>\n </section>\n \n <section>\n \n <h3> Background</h3>\n \n <p>Elicit is an artificial intelligence tool which may automate specific steps in conducting systematic reviews. However, the tool's performance and accuracy have not been independently assessed.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>For comparison, we sampled 20 randomized controlled trials of which data were extracted manually from a human reviewer. We assessed the variables study objectives, sample characteristics and size, study design, interventions, outcome measured, and intervention effects and classified the results into “more,” “equal to,” “partially equal,” and “deviating” extractions. STROBE checklist was used to report the study.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>We analysed 20 randomized controlled trials from 11 countries. The studies covered diverse healthcare topics. Across all seven variables, Elicit extracted “more” data in 29.3% of cases, “equal” in 20.7%, “partially equal” in 45.7%, and “deviating” in 4.3%. Elicit provided “more” information for the variable study design (100%) and sample characteristics (45%). In contrast, for more nuanced variables, such as “intervention effects,” Elicit's extractions were less detailed, with 95% rated as “partially equal.”</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>Elicit was capable of extracting data partly correct for our predefined variables. Variables like “intervention effect” or “intervention” may require a human reviewer to complete the data extraction. Our results suggest that verification by human reviewers is necessary to ensure that all relevant information is captured completely and correctly by Elicit.</p>\n </section>\n \n <section>\n \n <h3> Implications</h3>\n \n <p>Systematic reviews are labor-intensive. Data extraction process may be facilitated by artificial intelligence tools. Use of Elicit may require a human reviewer to double-check the extracted data.</p>\n </section>\n </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70033","citationCount":"0","resultStr":"{\"title\":\"Data Extractions Using a Large Language Model (Elicit) and Human Reviewers in Randomized Controlled Trials: A Systematic Comparison\",\"authors\":\"Joleen Bianchi, Julian Hirt, Magdalena Vogt, Janine Vetsch\",\"doi\":\"10.1002/cesm.70033\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Aim</h3>\\n \\n <p>We aimed at comparing data extractions from randomized controlled trials by using Elicit and human reviewers.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>Elicit is an artificial intelligence tool which may automate specific steps in conducting systematic reviews. However, the tool's performance and accuracy have not been independently assessed.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>For comparison, we sampled 20 randomized controlled trials of which data were extracted manually from a human reviewer. We assessed the variables study objectives, sample characteristics and size, study design, interventions, outcome measured, and intervention effects and classified the results into “more,” “equal to,” “partially equal,” and “deviating” extractions. STROBE checklist was used to report the study.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>We analysed 20 randomized controlled trials from 11 countries. The studies covered diverse healthcare topics. Across all seven variables, Elicit extracted “more” data in 29.3% of cases, “equal” in 20.7%, “partially equal” in 45.7%, and “deviating” in 4.3%. Elicit provided “more” information for the variable study design (100%) and sample characteristics (45%). In contrast, for more nuanced variables, such as “intervention effects,” Elicit's extractions were less detailed, with 95% rated as “partially equal.”</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions</h3>\\n \\n <p>Elicit was capable of extracting data partly correct for our predefined variables. Variables like “intervention effect” or “intervention” may require a human reviewer to complete the data extraction. Our results suggest that verification by human reviewers is necessary to ensure that all relevant information is captured completely and correctly by Elicit.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Implications</h3>\\n \\n <p>Systematic reviews are labor-intensive. Data extraction process may be facilitated by artificial intelligence tools. Use of Elicit may require a human reviewer to double-check the extracted data.</p>\\n </section>\\n </div>\",\"PeriodicalId\":100286,\"journal\":{\"name\":\"Cochrane Evidence Synthesis and Methods\",\"volume\":\"3 4\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70033\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cochrane Evidence Synthesis and Methods\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cesm.70033\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cochrane Evidence Synthesis and Methods","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cesm.70033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们的目的是通过使用Elicit和人工审稿人来比较随机对照试验的数据提取。Elicit是一种人工智能工具，它可以自动执行系统审查中的特定步骤。然而，该工具的性能和准确性尚未得到独立评估。方法为了进行比较，我们选取了20个随机对照试验，这些试验的数据都是人工从审稿人那里提取的。我们评估了研究目标、样本特征和规模、研究设计、干预措施、测量结果和干预效果等变量，并将结果分为“更多”、“相等”、“部分相等”和“偏离”提取。采用STROBE检查表进行研究报告。结果我们分析了来自11个国家的20个随机对照试验。这些研究涵盖了不同的医疗保健主题。在所有7个变量中，Elicit提取“更多”数据的情况占29.3%，“相等”的情况占20.7%，“部分相等”的情况占45.7%，“偏离”的情况占4.3%。Elicit为变量研究设计（100%）和样本特征（45%）提供了“更多”信息。相比之下，对于更细微的变量，如“干预效应”，Elicit的提取就不那么详细了，95%的人被评为“部分相等”。得出的结论是，Elicit能够提取出部分符合我们预定义变量的数据。诸如“干预效果”或“干预”之类的变量可能需要人工审阅人员来完成数据提取。我们的结果表明，人工审稿人的验证是必要的，以确保所有相关信息被Elicit完整而正确地捕获。系统审查是劳动密集型的。人工智能工具可以促进数据提取过程。使用Elicit可能需要人工审查人员对提取的数据进行双重检查。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Data Extractions Using a Large Language Model (Elicit) and Human Reviewers in Randomized Controlled Trials: A Systematic Comparison

查看原文本刊更多论文

Data Extractions Using a Large Language Model (Elicit) and Human Reviewers in Randomized Controlled Trials: A Systematic Comparison

Aim

We aimed at comparing data extractions from randomized controlled trials by using Elicit and human reviewers.

Background

Elicit is an artificial intelligence tool which may automate specific steps in conducting systematic reviews. However, the tool's performance and accuracy have not been independently assessed.

Methods

For comparison, we sampled 20 randomized controlled trials of which data were extracted manually from a human reviewer. We assessed the variables study objectives, sample characteristics and size, study design, interventions, outcome measured, and intervention effects and classified the results into “more,” “equal to,” “partially equal,” and “deviating” extractions. STROBE checklist was used to report the study.

Results

We analysed 20 randomized controlled trials from 11 countries. The studies covered diverse healthcare topics. Across all seven variables, Elicit extracted “more” data in 29.3% of cases, “equal” in 20.7%, “partially equal” in 45.7%, and “deviating” in 4.3%. Elicit provided “more” information for the variable study design (100%) and sample characteristics (45%). In contrast, for more nuanced variables, such as “intervention effects,” Elicit's extractions were less detailed, with 95% rated as “partially equal.”

Conclusions

Elicit was capable of extracting data partly correct for our predefined variables. Variables like “intervention effect” or “intervention” may require a human reviewer to complete the data extraction. Our results suggest that verification by human reviewers is necessary to ensure that all relevant information is captured completely and correctly by Elicit.

Implications

Systematic reviews are labor-intensive. Data extraction process may be facilitated by artificial intelligence tools. Use of Elicit may require a human reviewer to double-check the extracted data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Cochrane Evidence Synthesis and Methods

自引率

0.00%

发文量