{"title":"随机对照试验中使用大型语言模型(Elicit)和人工审稿人的数据提取:系统比较","authors":"Joleen Bianchi, Julian Hirt, Magdalena Vogt, Janine Vetsch","doi":"10.1002/cesm.70033","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Aim</h3>\n \n <p>We aimed at comparing data extractions from randomized controlled trials by using Elicit and human reviewers.</p>\n </section>\n \n <section>\n \n <h3> Background</h3>\n \n <p>Elicit is an artificial intelligence tool which may automate specific steps in conducting systematic reviews. However, the tool's performance and accuracy have not been independently assessed.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>For comparison, we sampled 20 randomized controlled trials of which data were extracted manually from a human reviewer. We assessed the variables study objectives, sample characteristics and size, study design, interventions, outcome measured, and intervention effects and classified the results into “more,” “equal to,” “partially equal,” and “deviating” extractions. STROBE checklist was used to report the study.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>We analysed 20 randomized controlled trials from 11 countries. The studies covered diverse healthcare topics. Across all seven variables, Elicit extracted “more” data in 29.3% of cases, “equal” in 20.7%, “partially equal” in 45.7%, and “deviating” in 4.3%. Elicit provided “more” information for the variable study design (100%) and sample characteristics (45%). In contrast, for more nuanced variables, such as “intervention effects,” Elicit's extractions were less detailed, with 95% rated as “partially equal.”</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>Elicit was capable of extracting data partly correct for our predefined variables. Variables like “intervention effect” or “intervention” may require a human reviewer to complete the data extraction. Our results suggest that verification by human reviewers is necessary to ensure that all relevant information is captured completely and correctly by Elicit.</p>\n </section>\n \n <section>\n \n <h3> Implications</h3>\n \n <p>Systematic reviews are labor-intensive. Data extraction process may be facilitated by artificial intelligence tools. Use of Elicit may require a human reviewer to double-check the extracted data.</p>\n </section>\n </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70033","citationCount":"0","resultStr":"{\"title\":\"Data Extractions Using a Large Language Model (Elicit) and Human Reviewers in Randomized Controlled Trials: A Systematic Comparison\",\"authors\":\"Joleen Bianchi, Julian Hirt, Magdalena Vogt, Janine Vetsch\",\"doi\":\"10.1002/cesm.70033\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Aim</h3>\\n \\n <p>We aimed at comparing data extractions from randomized controlled trials by using Elicit and human reviewers.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>Elicit is an artificial intelligence tool which may automate specific steps in conducting systematic reviews. However, the tool's performance and accuracy have not been independently assessed.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>For comparison, we sampled 20 randomized controlled trials of which data were extracted manually from a human reviewer. We assessed the variables study objectives, sample characteristics and size, study design, interventions, outcome measured, and intervention effects and classified the results into “more,” “equal to,” “partially equal,” and “deviating” extractions. STROBE checklist was used to report the study.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>We analysed 20 randomized controlled trials from 11 countries. The studies covered diverse healthcare topics. Across all seven variables, Elicit extracted “more” data in 29.3% of cases, “equal” in 20.7%, “partially equal” in 45.7%, and “deviating” in 4.3%. Elicit provided “more” information for the variable study design (100%) and sample characteristics (45%). In contrast, for more nuanced variables, such as “intervention effects,” Elicit's extractions were less detailed, with 95% rated as “partially equal.”</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions</h3>\\n \\n <p>Elicit was capable of extracting data partly correct for our predefined variables. Variables like “intervention effect” or “intervention” may require a human reviewer to complete the data extraction. Our results suggest that verification by human reviewers is necessary to ensure that all relevant information is captured completely and correctly by Elicit.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Implications</h3>\\n \\n <p>Systematic reviews are labor-intensive. Data extraction process may be facilitated by artificial intelligence tools. Use of Elicit may require a human reviewer to double-check the extracted data.</p>\\n </section>\\n </div>\",\"PeriodicalId\":100286,\"journal\":{\"name\":\"Cochrane Evidence Synthesis and Methods\",\"volume\":\"3 4\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70033\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cochrane Evidence Synthesis and Methods\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cesm.70033\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cochrane Evidence Synthesis and Methods","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cesm.70033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Data Extractions Using a Large Language Model (Elicit) and Human Reviewers in Randomized Controlled Trials: A Systematic Comparison
Aim
We aimed at comparing data extractions from randomized controlled trials by using Elicit and human reviewers.
Background
Elicit is an artificial intelligence tool which may automate specific steps in conducting systematic reviews. However, the tool's performance and accuracy have not been independently assessed.
Methods
For comparison, we sampled 20 randomized controlled trials of which data were extracted manually from a human reviewer. We assessed the variables study objectives, sample characteristics and size, study design, interventions, outcome measured, and intervention effects and classified the results into “more,” “equal to,” “partially equal,” and “deviating” extractions. STROBE checklist was used to report the study.
Results
We analysed 20 randomized controlled trials from 11 countries. The studies covered diverse healthcare topics. Across all seven variables, Elicit extracted “more” data in 29.3% of cases, “equal” in 20.7%, “partially equal” in 45.7%, and “deviating” in 4.3%. Elicit provided “more” information for the variable study design (100%) and sample characteristics (45%). In contrast, for more nuanced variables, such as “intervention effects,” Elicit's extractions were less detailed, with 95% rated as “partially equal.”
Conclusions
Elicit was capable of extracting data partly correct for our predefined variables. Variables like “intervention effect” or “intervention” may require a human reviewer to complete the data extraction. Our results suggest that verification by human reviewers is necessary to ensure that all relevant information is captured completely and correctly by Elicit.
Implications
Systematic reviews are labor-intensive. Data extraction process may be facilitated by artificial intelligence tools. Use of Elicit may require a human reviewer to double-check the extracted data.