大型语言模型随机对照试验数值结果的自动提取。

Proceedings of machine learning research Pub Date : 2024-08-01

Hye Sun Yun, David Pogrebitskiy, Iain J Marshall, Byron C Wallace

{"title":"大型语言模型随机对照试验数值结果的自动提取。","authors":"Hye Sun Yun, David Pogrebitskiy, Iain J Marshall, Byron C Wallace","doi":"","DOIUrl":null,"url":null,"abstract":"Meta-analyses statistically aggregate the findings of different randomized controlled trials (RCTs) to assess treatment effectiveness. Because this yields robust estimates of treatment effectiveness, results from meta-analyses are considered the strongest form of evidence. However, rigorous evidence syntheses are time-consuming and labor-intensive, requiring manual extraction of data from individual trials to be synthesized. Ideally, language technologies would permit fully automatic meta-analysis, on demand. This requires accurately extracting numerical results from individual trials, which has been beyond the capabilities of natural language processing (NLP) models to date. In this work, we evaluate whether modern large language models (LLMs) can reliably perform this task. We annotate (and release) a modest but granular evaluation dataset of clinical trial reports with numerical findings attached to interventions, comparators, and outcomes. Using this dataset, we evaluate the performance of seven LLMs applied zero-shot for the task of conditionally extracting numerical findings from trial reports. We find that massive LLMs that can accommodate lengthy inputs are tantalizingly close to realizing fully automatic meta-analysis, especially for dichotomous (binary) outcomes (e.g., mortality). However, LLMs-including ones trained on biomedical texts-perform poorly when the outcome measures are complex and tallying the results requires inference. This work charts a path toward fully automatic meta-analysis of RCTs via LLMs, while also highlighting the limitations of existing models for this aim.","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"252 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12448672/pdf/","citationCount":"0","resultStr":"{\"title\":\"Automatically Extracting Numerical Results from Randomized Controlled Trials with Large Language Models.\",\"authors\":\"Hye Sun Yun, David Pogrebitskiy, Iain J Marshall, Byron C Wallace\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Meta-analyses statistically aggregate the findings of different randomized controlled trials (RCTs) to assess treatment effectiveness. Because this yields robust estimates of treatment effectiveness, results from meta-analyses are considered the strongest form of evidence. However, rigorous evidence syntheses are time-consuming and labor-intensive, requiring manual extraction of data from individual trials to be synthesized. Ideally, language technologies would permit fully automatic meta-analysis, on demand. This requires accurately extracting numerical results from individual trials, which has been beyond the capabilities of natural language processing (NLP) models to date. In this work, we evaluate whether modern large language models (LLMs) can reliably perform this task. We annotate (and release) a modest but granular evaluation dataset of clinical trial reports with numerical findings attached to interventions, comparators, and outcomes. Using this dataset, we evaluate the performance of seven LLMs applied zero-shot for the task of conditionally extracting numerical findings from trial reports. We find that massive LLMs that can accommodate lengthy inputs are tantalizingly close to realizing fully automatic meta-analysis, especially for dichotomous (binary) outcomes (e.g., mortality). However, LLMs-including ones trained on biomedical texts-perform poorly when the outcome measures are complex and tallying the results requires inference. This work charts a path toward fully automatic meta-analysis of RCTs via LLMs, while also highlighting the limitations of existing models for this aim.\",\"PeriodicalId\":74504,\"journal\":{\"name\":\"Proceedings of machine learning research\",\"volume\":\"252 \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12448672/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of machine learning research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of machine learning research","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

荟萃分析统计汇总了不同随机对照试验（rct）的结果，以评估治疗效果。由于这产生了对治疗有效性的可靠估计，因此荟萃分析的结果被认为是最有力的证据形式。然而，严格的证据合成是耗时和劳动密集型的，需要人工从单个试验中提取数据进行合成。理想情况下，语言技术将允许根据需要进行全自动元分析。这需要准确地从单个试验中提取数值结果，这已经超出了自然语言处理（NLP）模型的能力。在这项工作中，我们评估了现代大型语言模型（llm）是否能够可靠地执行这项任务。我们注释（并发布）了一个适度但精细的临床试验报告评估数据集，并附上了干预措施、比较物和结果的数值结果。使用该数据集，我们评估了七个应用零射击的llm的性能，该任务是有条件地从试验报告中提取数值结果。我们发现，能够容纳长输入的大量llm非常接近于实现全自动元分析，特别是对于二分类（二元）结果（例如死亡率）。然而，法学硕士——包括那些接受过生物医学文本培训的法学硕士——在结果测量很复杂且计算结果需要推理时表现不佳。这项工作为通过llm实现rct的全自动荟萃分析指明了道路，同时也强调了现有模型在这一目标上的局限性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

本刊更多论文

Automatically Extracting Numerical Results from Randomized Controlled Trials with Large Language Models.

Meta-analyses statistically aggregate the findings of different randomized controlled trials (RCTs) to assess treatment effectiveness. Because this yields robust estimates of treatment effectiveness, results from meta-analyses are considered the strongest form of evidence. However, rigorous evidence syntheses are time-consuming and labor-intensive, requiring manual extraction of data from individual trials to be synthesized. Ideally, language technologies would permit fully automatic meta-analysis, on demand. This requires accurately extracting numerical results from individual trials, which has been beyond the capabilities of natural language processing (NLP) models to date. In this work, we evaluate whether modern large language models (LLMs) can reliably perform this task. We annotate (and release) a modest but granular evaluation dataset of clinical trial reports with numerical findings attached to interventions, comparators, and outcomes. Using this dataset, we evaluate the performance of seven LLMs applied zero-shot for the task of conditionally extracting numerical findings from trial reports. We find that massive LLMs that can accommodate lengthy inputs are tantalizingly close to realizing fully automatic meta-analysis, especially for dichotomous (binary) outcomes (e.g., mortality). However, LLMs-including ones trained on biomedical texts-perform poorly when the outcome measures are complex and tallying the results requires inference. This work charts a path toward fully automatic meta-analysis of RCTs via LLMs, while also highlighting the limitations of existing models for this aim.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of machine learning research

自引率

0.00%

发文量