Grammar-constrained decoding for structured information extraction with fine-tuned generative models applied to clinical trial abstracts.

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Artificial Intelligence Pub Date : 2025-01-07 eCollection Date: 2024-01-01 DOI:10.3389/frai.2024.1406857

David M Schmidt, Philipp Cimiano

{"title":"Grammar-constrained decoding for structured information extraction with fine-tuned generative models applied to clinical trial abstracts.","authors":"David M Schmidt, Philipp Cimiano","doi":"10.3389/frai.2024.1406857","DOIUrl":null,"url":null,"abstract":"Background: In the field of structured information extraction, there are typically semantic and syntactic constraints on the output of information extraction (IE) systems. These constraints, however, can typically not be guaranteed using standard (fine-tuned) encoder-decoder architectures. This has led to the development of constrained decoding approaches which allow, e.g., to specify constraints in form of context-free grammars. An open question is in how far an IE system can be effectively guided by a domain-specific grammar to ensure that the output structures follow the requirements of a certain domain data model.Methods: In this work we experimentally investigate the influence of grammar-constrained decoding as well as pointer generators on the performance of a domain-specific information extraction system. For this, we consider fine-tuned encoder-decoder models, Longformer and Flan-T5 in particular, and experimentally investigate whether the addition of grammar-constrained decoding and pointer generators improve information extraction results. Toward this goal, we consider the task of inducing structured representations from abstracts describing clinical trials, relying on the C-TrO ontology to semantically describe the clinical trials and their results. We frame the task as a slot filling problem where certain slots of templates need to be filled with token sequences occurring in the input text. We use a dataset comprising 211 annotated clinical trial abstracts about type 2 diabetes and glaucoma for training and evaluation. Our focus is on settings in which the available training data is in the order of a few hundred training examples, which we consider as a low-resource setting.Results: In all our experiments we could demonstrate the positive impact of grammar-constrained decoding, with an increase in F 1 score of pp 0.351 (absolute score 0.413) and pp 0.425 (absolute score 0.47) for the best-performing models on type 2 diabetes and glaucoma datasets, respectively. The addition of the pointer generators had a detrimental impact on the results, decreasing F 1 scores by pp 0.15 (absolute score 0.263) and pp 0.198 (absolute score 0.272) for the best-performing pointer generator models on type 2 diabetes and glaucoma datasets, respectively.Conclusion: The experimental results indicate that encoder-decoder models used for structure prediction for information extraction tasks in low-resource settings clearly benefit from grammar-constrained decoding guiding the output generation. In contrast, the evaluated pointer generator models decreased the performance drastically in some cases. Moreover, the performance of the pointer models appears to depend both on the used base model as well as the function used for aggregating the attention values. How the size of large language models affects the performance benefit of grammar-constrained decoding remains to be more structurally investigated in future work.","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1406857"},"PeriodicalIF":3.0000,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11747381/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2024.1406857","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Background: In the field of structured information extraction, there are typically semantic and syntactic constraints on the output of information extraction (IE) systems. These constraints, however, can typically not be guaranteed using standard (fine-tuned) encoder-decoder architectures. This has led to the development of constrained decoding approaches which allow, e.g., to specify constraints in form of context-free grammars. An open question is in how far an IE system can be effectively guided by a domain-specific grammar to ensure that the output structures follow the requirements of a certain domain data model.

Methods: In this work we experimentally investigate the influence of grammar-constrained decoding as well as pointer generators on the performance of a domain-specific information extraction system. For this, we consider fine-tuned encoder-decoder models, Longformer and Flan-T5 in particular, and experimentally investigate whether the addition of grammar-constrained decoding and pointer generators improve information extraction results. Toward this goal, we consider the task of inducing structured representations from abstracts describing clinical trials, relying on the C-TrO ontology to semantically describe the clinical trials and their results. We frame the task as a slot filling problem where certain slots of templates need to be filled with token sequences occurring in the input text. We use a dataset comprising 211 annotated clinical trial abstracts about type 2 diabetes and glaucoma for training and evaluation. Our focus is on settings in which the available training data is in the order of a few hundred training examples, which we consider as a low-resource setting.

Results: In all our experiments we could demonstrate the positive impact of grammar-constrained decoding, with an increase in F ₁ score of pp 0.351 (absolute score 0.413) and pp 0.425 (absolute score 0.47) for the best-performing models on type 2 diabetes and glaucoma datasets, respectively. The addition of the pointer generators had a detrimental impact on the results, decreasing F ₁ scores by pp 0.15 (absolute score 0.263) and pp 0.198 (absolute score 0.272) for the best-performing pointer generator models on type 2 diabetes and glaucoma datasets, respectively.

Conclusion: The experimental results indicate that encoder-decoder models used for structure prediction for information extraction tasks in low-resource settings clearly benefit from grammar-constrained decoding guiding the output generation. In contrast, the evaluated pointer generator models decreased the performance drastically in some cases. Moreover, the performance of the pointer models appears to depend both on the used base model as well as the function used for aggregating the attention values. How the size of large language models affects the performance benefit of grammar-constrained decoding remains to be more structurally investigated in future work.

查看原文本刊更多论文

应用于临床试验摘要的精细生成模型的结构化信息提取的语法约束解码。

背景：在结构化信息抽取领域，信息抽取（IE）系统的输出通常存在语义和句法约束。然而，使用标准（微调）编码器-解码器体系结构通常不能保证这些约束。这导致了约束解码方法的发展，例如，允许以与上下文无关的语法形式指定约束。一个悬而未决的问题是，IE系统在多大程度上可以被特定于领域的语法有效地引导，以确保输出结构符合特定领域数据模型的要求。方法：在这项工作中，我们实验研究了语法约束解码和指针生成器对特定领域信息提取系统性能的影响。为此，我们考虑微调编码器-解码器模型，特别是Longformer和Flan-T5，并通过实验研究添加语法约束解码和指针生成器是否可以改善信息提取结果。为了实现这一目标，我们考虑了从描述临床试验的摘要中归纳结构化表示的任务，依靠C-TrO本体在语义上描述临床试验及其结果。我们将此任务定义为一个槽填充问题，其中模板的某些槽需要用输入文本中出现的标记序列填充。我们使用了一个包含211篇关于2型糖尿病和青光眼的带注释的临床试验摘要的数据集来进行培训和评估。我们的重点是可用训练数据在几百个训练样例的设置，我们认为这是一个低资源设置。结果：在我们所有的实验中，我们都可以证明语法约束解码的积极影响，在2型糖尿病和青光眼数据集上，表现最好的模型的f1分数分别提高了0.351（绝对分数0.413）和0.425（绝对分数0.47）。添加指针生成器对结果有不利影响，在2型糖尿病和青光眼数据集上，表现最好的指针生成器模型的f1分数分别下降了0.15（绝对分数0.263）和0.198（绝对分数0.272）。结论：实验结果表明，用于低资源环境下信息提取任务的结构预测的编码器-解码器模型明显受益于语法约束解码指导输出生成。相反，在某些情况下，评估的指针生成器模型会大幅降低性能。此外，指针模型的性能似乎既取决于所使用的基本模型，也取决于用于聚合注意值的函数。大型语言模型的大小如何影响语法约束解码的性能优势，在未来的工作中还需要进行更多的结构性研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊