Large-scale AMR Corpus with Re-generated Sentences: Domain Adaptive Pre-training on ACL Anthology Corpus

2022 International Conference on Advanced Computer Science and Information Systems (ICACSIS) Pub Date : 2022-10-01 DOI:10.1109/ICACSIS56558.2022.9923502

Mingyi Zhao, Yaling Wang, Y. Lepage

引用次数: 0

Abstract

Abstract Meaning Representation (AMR) is a broad -coverage formalism for capturing the semantics of a given sentence. However, domain adaptation of AMR is limited by the shortage of annotated AMR graphs. In this paper, we explore and build a new large-scale dataset with 2.3 million AMRs in the domain of academic writing. Additionally, we prove that 30% of them are of similar quality as the annotated data in the downstream AMR-to-text task. Our results outperform previous graph-based approaches by over 11 BLEU points. We provide a pipeline that integrates automated generation and evaluation. This can help explore other AMR benchmarks.

查看原文本刊更多论文

具有再生句子的大规模AMR语料库:ACL文集语料库的领域自适应预训练

抽象意义表示(AMR)是一种用于捕获给定句子语义的广泛形式体系。然而，由于缺乏带注释的AMR图，限制了AMR的领域自适应。在本文中，我们探索并构建了一个包含230万个学术写作领域amr的新大规模数据集。此外，我们证明其中30%的数据与下游AMR-to-text任务中标注的数据质量相似。我们的结果比以前基于图形的方法高出11个BLEU点。我们提供了一个集成自动生成和评估的管道。这有助于探索其他AMR基准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 International Conference on Advanced Computer Science and Information Systems (ICACSIS)

自引率

0.00%

发文量