Xiaobo Wang, Xuan Zhang, Jing Cheng, Kunpeng Du, Chen Gao, Zhuxian Ma, Bo Liu
{"title":"使用再训练和统一线性化促进无监督数据到文本的生成","authors":"Xiaobo Wang, Xuan Zhang, Jing Cheng, Kunpeng Du, Chen Gao, Zhuxian Ma, Bo Liu","doi":"10.1002/cpe.70254","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>In recent years, many studies have focused on unsupervised data-to-text generation methods. However, existing unsupervised methods still require a large amount of unlabeled sample training, leading to significant data collection overhead. We propose a low-resource unsupervised method called CycleRUR. This method first converts various forms of structured data (such as tables, knowledge graph(KG) triples, and meaning representations(MR)) into unified KG triples to improve the model's ability to adapt to different structured data. Additionally, CycleRUR incorporates a retraining module and a contrastive learning module within a cycle training framework, enabling the model to learn and converge from a small amount of unpaired KG triples and reference text corpus, thereby improving the model's accuracy and convergence speed. We evaluated the model's performance on the WebNLG and E2E datasets. Using only 10% of unpaired training data, our method achieved the effects of fully supervised fine-tuning. On the WebNLG dataset, it resulted in an 18.41% improvement in METEOR compared to supervised models. On the E2E dataset, it achieved improvements of 1.37% in METEOR and 4.97% in BLEU. Experiments also demonstrated that under unified linearization, CycleRUR exhibits good generalization capabilities.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 23-24","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Promoting Unsupervised Data-To-Text Generation Using Retraining and Unified Linearization\",\"authors\":\"Xiaobo Wang, Xuan Zhang, Jing Cheng, Kunpeng Du, Chen Gao, Zhuxian Ma, Bo Liu\",\"doi\":\"10.1002/cpe.70254\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>In recent years, many studies have focused on unsupervised data-to-text generation methods. However, existing unsupervised methods still require a large amount of unlabeled sample training, leading to significant data collection overhead. We propose a low-resource unsupervised method called CycleRUR. This method first converts various forms of structured data (such as tables, knowledge graph(KG) triples, and meaning representations(MR)) into unified KG triples to improve the model's ability to adapt to different structured data. Additionally, CycleRUR incorporates a retraining module and a contrastive learning module within a cycle training framework, enabling the model to learn and converge from a small amount of unpaired KG triples and reference text corpus, thereby improving the model's accuracy and convergence speed. We evaluated the model's performance on the WebNLG and E2E datasets. Using only 10% of unpaired training data, our method achieved the effects of fully supervised fine-tuning. On the WebNLG dataset, it resulted in an 18.41% improvement in METEOR compared to supervised models. On the E2E dataset, it achieved improvements of 1.37% in METEOR and 4.97% in BLEU. Experiments also demonstrated that under unified linearization, CycleRUR exhibits good generalization capabilities.</p>\\n </div>\",\"PeriodicalId\":55214,\"journal\":{\"name\":\"Concurrency and Computation-Practice & Experience\",\"volume\":\"37 23-24\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2025-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Concurrency and Computation-Practice & Experience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70254\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70254","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
Promoting Unsupervised Data-To-Text Generation Using Retraining and Unified Linearization
In recent years, many studies have focused on unsupervised data-to-text generation methods. However, existing unsupervised methods still require a large amount of unlabeled sample training, leading to significant data collection overhead. We propose a low-resource unsupervised method called CycleRUR. This method first converts various forms of structured data (such as tables, knowledge graph(KG) triples, and meaning representations(MR)) into unified KG triples to improve the model's ability to adapt to different structured data. Additionally, CycleRUR incorporates a retraining module and a contrastive learning module within a cycle training framework, enabling the model to learn and converge from a small amount of unpaired KG triples and reference text corpus, thereby improving the model's accuracy and convergence speed. We evaluated the model's performance on the WebNLG and E2E datasets. Using only 10% of unpaired training data, our method achieved the effects of fully supervised fine-tuning. On the WebNLG dataset, it resulted in an 18.41% improvement in METEOR compared to supervised models. On the E2E dataset, it achieved improvements of 1.37% in METEOR and 4.97% in BLEU. Experiments also demonstrated that under unified linearization, CycleRUR exhibits good generalization capabilities.
期刊介绍:
Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of:
Parallel and distributed computing;
High-performance computing;
Computational and data science;
Artificial intelligence and machine learning;
Big data applications, algorithms, and systems;
Network science;
Ontologies and semantics;
Security and privacy;
Cloud/edge/fog computing;
Green computing; and
Quantum computing.