Promoting Unsupervised Data-To-Text Generation Using Retraining and Unified Linearization

IF 1.5 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Xiaobo Wang, Xuan Zhang, Jing Cheng, Kunpeng Du, Chen Gao, Zhuxian Ma, Bo Liu
{"title":"Promoting Unsupervised Data-To-Text Generation Using Retraining and Unified Linearization","authors":"Xiaobo Wang,&nbsp;Xuan Zhang,&nbsp;Jing Cheng,&nbsp;Kunpeng Du,&nbsp;Chen Gao,&nbsp;Zhuxian Ma,&nbsp;Bo Liu","doi":"10.1002/cpe.70254","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>In recent years, many studies have focused on unsupervised data-to-text generation methods. However, existing unsupervised methods still require a large amount of unlabeled sample training, leading to significant data collection overhead. We propose a low-resource unsupervised method called CycleRUR. This method first converts various forms of structured data (such as tables, knowledge graph(KG) triples, and meaning representations(MR)) into unified KG triples to improve the model's ability to adapt to different structured data. Additionally, CycleRUR incorporates a retraining module and a contrastive learning module within a cycle training framework, enabling the model to learn and converge from a small amount of unpaired KG triples and reference text corpus, thereby improving the model's accuracy and convergence speed. We evaluated the model's performance on the WebNLG and E2E datasets. Using only 10% of unpaired training data, our method achieved the effects of fully supervised fine-tuning. On the WebNLG dataset, it resulted in an 18.41% improvement in METEOR compared to supervised models. On the E2E dataset, it achieved improvements of 1.37% in METEOR and 4.97% in BLEU. Experiments also demonstrated that under unified linearization, CycleRUR exhibits good generalization capabilities.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 23-24","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70254","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years, many studies have focused on unsupervised data-to-text generation methods. However, existing unsupervised methods still require a large amount of unlabeled sample training, leading to significant data collection overhead. We propose a low-resource unsupervised method called CycleRUR. This method first converts various forms of structured data (such as tables, knowledge graph(KG) triples, and meaning representations(MR)) into unified KG triples to improve the model's ability to adapt to different structured data. Additionally, CycleRUR incorporates a retraining module and a contrastive learning module within a cycle training framework, enabling the model to learn and converge from a small amount of unpaired KG triples and reference text corpus, thereby improving the model's accuracy and convergence speed. We evaluated the model's performance on the WebNLG and E2E datasets. Using only 10% of unpaired training data, our method achieved the effects of fully supervised fine-tuning. On the WebNLG dataset, it resulted in an 18.41% improvement in METEOR compared to supervised models. On the E2E dataset, it achieved improvements of 1.37% in METEOR and 4.97% in BLEU. Experiments also demonstrated that under unified linearization, CycleRUR exhibits good generalization capabilities.

使用再训练和统一线性化促进无监督数据到文本的生成
近年来,许多研究都集中在无监督数据到文本的生成方法上。然而,现有的无监督方法仍然需要大量的无标记样本训练,导致大量的数据收集开销。我们提出了一种低资源无监督方法CycleRUR。该方法首先将各种形式的结构化数据(如表、知识图三元组和意义表示三元组)转换为统一的KG三元组,以提高模型适应不同结构化数据的能力。此外,CycleRUR在循环训练框架中加入了再训练模块和对比学习模块,使模型能够从少量未配对的KG三元组和参考文本语料库中学习和收敛,从而提高了模型的准确性和收敛速度。我们在WebNLG和E2E数据集上评估了模型的性能。仅使用10%的未配对训练数据,我们的方法就达到了完全监督微调的效果。在WebNLG数据集上,与监督模型相比,METEOR的效率提高了18.41%。在端到端数据集上,METEOR的改进率为1.37%,BLEU的改进率为4.97%。实验还表明,在统一线性化下,CycleRUR具有良好的泛化能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Concurrency and Computation-Practice & Experience
Concurrency and Computation-Practice & Experience 工程技术-计算机:理论方法
CiteScore
5.00
自引率
10.00%
发文量
664
审稿时长
9.6 months
期刊介绍: Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of: Parallel and distributed computing; High-performance computing; Computational and data science; Artificial intelligence and machine learning; Big data applications, algorithms, and systems; Network science; Ontologies and semantics; Security and privacy; Cloud/edge/fog computing; Green computing; and Quantum computing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信