Improving Logical-Level Natural Language Generation with Topic-Conditioned Data Augmentation and Logical Form Generation

Ao Liu, Congjian Luo, Naoaki Okazaki
{"title":"Improving Logical-Level Natural Language Generation with Topic-Conditioned Data Augmentation and Logical Form Generation","authors":"Ao Liu, Congjian Luo, Naoaki Okazaki","doi":"10.2197/ipsjjip.31.332","DOIUrl":null,"url":null,"abstract":"Logical Natural Language Generation, i.e., generating textual descriptions that can be logically entailed by a structured table, has been a challenge due to the low fidelity of the generation. \\citet{chen2020logic2text} have addressed this problem by annotating interim logical programs to control the generation contents and semantics, and presented the task of table-aware logical form to text (Logic2text) generation. However, although table instances are abundant in the real world, logical forms paired with textual descriptions require costly human annotation work, which limits the performance of neural models. To mitigate this, we propose topic-conditioned data augmentation (TopicDA), which utilizes GPT-2 to generate unpaired logical forms and textual descriptions directly from tables. We further introduce logical form generation (LG), a dual task of Logic2text that requires generating a valid logical form based on a text description of a table. We also propose a semi-supervised learning approach to jointly train a Logic2text and an LG model with both labeled and augmented data. The two models benefit from each other by providing extra supervision signals through back-translation. Experimental results on the Logic2text dataset and the LG task demonstrate that our approach can effectively utilize the augmented data and outperform supervised baselines by a substantial margin.","PeriodicalId":430763,"journal":{"name":"J. Inf. Process.","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Inf. Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2197/ipsjjip.31.332","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Logical Natural Language Generation, i.e., generating textual descriptions that can be logically entailed by a structured table, has been a challenge due to the low fidelity of the generation. \citet{chen2020logic2text} have addressed this problem by annotating interim logical programs to control the generation contents and semantics, and presented the task of table-aware logical form to text (Logic2text) generation. However, although table instances are abundant in the real world, logical forms paired with textual descriptions require costly human annotation work, which limits the performance of neural models. To mitigate this, we propose topic-conditioned data augmentation (TopicDA), which utilizes GPT-2 to generate unpaired logical forms and textual descriptions directly from tables. We further introduce logical form generation (LG), a dual task of Logic2text that requires generating a valid logical form based on a text description of a table. We also propose a semi-supervised learning approach to jointly train a Logic2text and an LG model with both labeled and augmented data. The two models benefit from each other by providing extra supervision signals through back-translation. Experimental results on the Logic2text dataset and the LG task demonstrate that our approach can effectively utilize the augmented data and outperform supervised baselines by a substantial margin.
用主题条件数据增强和逻辑形式生成改进逻辑级自然语言生成
逻辑自然语言生成,即生成结构化表在逻辑上需要的文本描述,由于生成的保真度较低,一直是一个挑战。\citet{chen2020logic2text}通过注释临时逻辑程序来控制生成内容和语义来解决这个问题,并提出了表感知逻辑形式生成文本(Logic2text)的任务。然而,尽管在现实世界中有大量的表实例,但是与文本描述配对的逻辑形式需要昂贵的人工注释工作,这限制了神经模型的性能。为了缓解这一问题,我们提出了主题条件数据增强(TopicDA),它利用GPT-2直接从表中生成不配对的逻辑形式和文本描述。我们进一步介绍逻辑表单生成(LG),这是Logic2text的双重任务,需要根据表的文本描述生成有效的逻辑表单。我们还提出了一种半监督学习方法来联合训练具有标记和增强数据的Logic2text和LG模型。两种模型通过反向翻译提供额外的监督信号,从而相互受益。在Logic2text数据集和LG任务上的实验结果表明,我们的方法可以有效地利用增强数据,并且在很大程度上优于监督基线。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信