用事实错误检测改进文本简化

Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022) Pub Date : 1900-01-01 DOI:10.18653/v1/2022.tsar-1.16

Yuan Ma, Sandaru Seneviratne, E. Daskalaki

{"title":"用事实错误检测改进文本简化","authors":"Yuan Ma, Sandaru Seneviratne, E. Daskalaki","doi":"10.18653/v1/2022.tsar-1.16","DOIUrl":null,"url":null,"abstract":"In the past few years, the field of text simplification has been dominated by supervised learning approaches thanks to the appearance of large parallel datasets such as Wikilarge and Newsela. However, these datasets suffer from sentence pairs with factuality errors which compromise the models’ performance. So, we proposed a model-independent factuality error detection mechanism, considering bad simplification and bad alignment, to refine the Wikilarge dataset through reducing the weight of these samples during training. We demonstrated that this approach improved the performance of the state-of-the-art text simplification model TST5 by an FKGL reduction of 0.33 and 0.29 on the TurkCorpus and ASSET testing datasets respectively. Our study illustrates the impact of erroneous samples in TS datasets and highlights the need for automatic methods to improve their quality.","PeriodicalId":247582,"journal":{"name":"Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Improving Text Simplification with Factuality Error Detection\",\"authors\":\"Yuan Ma, Sandaru Seneviratne, E. Daskalaki\",\"doi\":\"10.18653/v1/2022.tsar-1.16\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the past few years, the field of text simplification has been dominated by supervised learning approaches thanks to the appearance of large parallel datasets such as Wikilarge and Newsela. However, these datasets suffer from sentence pairs with factuality errors which compromise the models’ performance. So, we proposed a model-independent factuality error detection mechanism, considering bad simplification and bad alignment, to refine the Wikilarge dataset through reducing the weight of these samples during training. We demonstrated that this approach improved the performance of the state-of-the-art text simplification model TST5 by an FKGL reduction of 0.33 and 0.29 on the TurkCorpus and ASSET testing datasets respectively. Our study illustrates the impact of erroneous samples in TS datasets and highlights the need for automatic methods to improve their quality.\",\"PeriodicalId\":247582,\"journal\":{\"name\":\"Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2022.tsar-1.16\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.tsar-1.16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在过去的几年中，由于出现了大型并行数据集(如Wikilarge和Newsela)，文本简化领域一直由监督学习方法主导。然而，这些数据集受到具有事实性错误的句子对的影响，从而影响了模型的性能。因此，我们提出了一种模型无关的事实性错误检测机制，考虑到糟糕的简化和糟糕的对齐，通过在训练过程中降低这些样本的权重来改进wikillarge数据集。我们证明，这种方法提高了最先进的文本简化模型TST5的性能，在TurkCorpus和ASSET测试数据集上，FKGL分别降低了0.33和0.29。我们的研究说明了TS数据集中错误样本的影响，并强调了自动化方法提高其质量的必要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving Text Simplification with Factuality Error Detection

In the past few years, the field of text simplification has been dominated by supervised learning approaches thanks to the appearance of large parallel datasets such as Wikilarge and Newsela. However, these datasets suffer from sentence pairs with factuality errors which compromise the models’ performance. So, we proposed a model-independent factuality error detection mechanism, considering bad simplification and bad alignment, to refine the Wikilarge dataset through reducing the weight of these samples during training. We demonstrated that this approach improved the performance of the state-of-the-art text simplification model TST5 by an FKGL reduction of 0.33 and 0.29 on the TurkCorpus and ASSET testing datasets respectively. Our study illustrates the impact of erroneous samples in TS datasets and highlights the need for automatic methods to improve their quality.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)

自引率

0.00%

发文量