使用再训练和统一线性化促进无监督数据到文本的生成

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Concurrency and Computation-Practice & Experience Pub Date : 2025-09-03 DOI:10.1002/cpe.70254

Xiaobo Wang, Xuan Zhang, Jing Cheng, Kunpeng Du, Chen Gao, Zhuxian Ma, Bo Liu

{"title":"使用再训练和统一线性化促进无监督数据到文本的生成","authors":"Xiaobo Wang, Xuan Zhang, Jing Cheng, Kunpeng Du, Chen Gao, Zhuxian Ma, Bo Liu","doi":"10.1002/cpe.70254","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>In recent years, many studies have focused on unsupervised data-to-text generation methods. However, existing unsupervised methods still require a large amount of unlabeled sample training, leading to significant data collection overhead. We propose a low-resource unsupervised method called CycleRUR. This method first converts various forms of structured data (such as tables, knowledge graph(KG) triples, and meaning representations(MR)) into unified KG triples to improve the model's ability to adapt to different structured data. Additionally, CycleRUR incorporates a retraining module and a contrastive learning module within a cycle training framework, enabling the model to learn and converge from a small amount of unpaired KG triples and reference text corpus, thereby improving the model's accuracy and convergence speed. We evaluated the model's performance on the WebNLG and E2E datasets. Using only 10% of unpaired training data, our method achieved the effects of fully supervised fine-tuning. On the WebNLG dataset, it resulted in an 18.41% improvement in METEOR compared to supervised models. On the E2E dataset, it achieved improvements of 1.37% in METEOR and 4.97% in BLEU. Experiments also demonstrated that under unified linearization, CycleRUR exhibits good generalization capabilities.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 23-24","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Promoting Unsupervised Data-To-Text Generation Using Retraining and Unified Linearization\",\"authors\":\"Xiaobo Wang, Xuan Zhang, Jing Cheng, Kunpeng Du, Chen Gao, Zhuxian Ma, Bo Liu\",\"doi\":\"10.1002/cpe.70254\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>In recent years, many studies have focused on unsupervised data-to-text generation methods. However, existing unsupervised methods still require a large amount of unlabeled sample training, leading to significant data collection overhead. We propose a low-resource unsupervised method called CycleRUR. This method first converts various forms of structured data (such as tables, knowledge graph(KG) triples, and meaning representations(MR)) into unified KG triples to improve the model's ability to adapt to different structured data. Additionally, CycleRUR incorporates a retraining module and a contrastive learning module within a cycle training framework, enabling the model to learn and converge from a small amount of unpaired KG triples and reference text corpus, thereby improving the model's accuracy and convergence speed. We evaluated the model's performance on the WebNLG and E2E datasets. Using only 10% of unpaired training data, our method achieved the effects of fully supervised fine-tuning. On the WebNLG dataset, it resulted in an 18.41% improvement in METEOR compared to supervised models. On the E2E dataset, it achieved improvements of 1.37% in METEOR and 4.97% in BLEU. Experiments also demonstrated that under unified linearization, CycleRUR exhibits good generalization capabilities.</p>\\n </div>\",\"PeriodicalId\":55214,\"journal\":{\"name\":\"Concurrency and Computation-Practice & Experience\",\"volume\":\"37 23-24\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2025-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Concurrency and Computation-Practice & Experience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70254\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70254","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

近年来，许多研究都集中在无监督数据到文本的生成方法上。然而，现有的无监督方法仍然需要大量的无标记样本训练，导致大量的数据收集开销。我们提出了一种低资源无监督方法CycleRUR。该方法首先将各种形式的结构化数据（如表、知识图三元组和意义表示三元组）转换为统一的KG三元组，以提高模型适应不同结构化数据的能力。此外，CycleRUR在循环训练框架中加入了再训练模块和对比学习模块，使模型能够从少量未配对的KG三元组和参考文本语料库中学习和收敛，从而提高了模型的准确性和收敛速度。我们在WebNLG和E2E数据集上评估了模型的性能。仅使用10%的未配对训练数据，我们的方法就达到了完全监督微调的效果。在WebNLG数据集上，与监督模型相比，METEOR的效率提高了18.41%。在端到端数据集上，METEOR的改进率为1.37%，BLEU的改进率为4.97%。实验还表明，在统一线性化下，CycleRUR具有良好的泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Promoting Unsupervised Data-To-Text Generation Using Retraining and Unified Linearization

In recent years, many studies have focused on unsupervised data-to-text generation methods. However, existing unsupervised methods still require a large amount of unlabeled sample training, leading to significant data collection overhead. We propose a low-resource unsupervised method called CycleRUR. This method first converts various forms of structured data (such as tables, knowledge graph(KG) triples, and meaning representations(MR)) into unified KG triples to improve the model's ability to adapt to different structured data. Additionally, CycleRUR incorporates a retraining module and a contrastive learning module within a cycle training framework, enabling the model to learn and converge from a small amount of unpaired KG triples and reference text corpus, thereby improving the model's accuracy and convergence speed. We evaluated the model's performance on the WebNLG and E2E datasets. Using only 10% of unpaired training data, our method achieved the effects of fully supervised fine-tuning. On the WebNLG dataset, it resulted in an 18.41% improvement in METEOR compared to supervised models. On the E2E dataset, it achieved improvements of 1.37% in METEOR and 4.97% in BLEU. Experiments also demonstrated that under unified linearization, CycleRUR exhibits good generalization capabilities.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Concurrency and Computation-Practice & Experience 工程技术-计算机：理论方法

CiteScore

5.00

自引率

10.00%

发文量

664

审稿时长

9.6 months

期刊介绍： Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of: Parallel and distributed computing; High-performance computing; Computational and data science; Artificial intelligence and machine learning; Big data applications, algorithms, and systems; Network science; Ontologies and semantics; Security and privacy; Cloud/edge/fog computing; Green computing; and Quantum computing.