语法纠错合成数据生成方法的比较研究

Workshop on Innovative Use of NLP for Building Educational Applications Pub Date : 2020-07-01 DOI:10.18653/v1/2020.bea-1.21

Max White, Alla Rozovskaya

{"title":"语法纠错合成数据生成方法的比较研究","authors":"Max White, Alla Rozovskaya","doi":"10.18653/v1/2020.bea-1.21","DOIUrl":null,"url":null,"abstract":"Grammatical Error Correction (GEC) is concerned with correcting grammatical errors in written text. Current GEC systems, namely those leveraging statistical and neural machine translation, require large quantities of annotated training data, which can be expensive or impractical to obtain. This research compares techniques for generating synthetic data utilized by the two highest scoring submissions to the restricted and low-resource tracks in the BEA-2019 Shared Task on Grammatical Error Correction.","PeriodicalId":363390,"journal":{"name":"Workshop on Innovative Use of NLP for Building Educational Applications","volume":"2016 18","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"A Comparative Study of Synthetic Data Generation Methods for Grammatical Error Correction\",\"authors\":\"Max White, Alla Rozovskaya\",\"doi\":\"10.18653/v1/2020.bea-1.21\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Grammatical Error Correction (GEC) is concerned with correcting grammatical errors in written text. Current GEC systems, namely those leveraging statistical and neural machine translation, require large quantities of annotated training data, which can be expensive or impractical to obtain. This research compares techniques for generating synthetic data utilized by the two highest scoring submissions to the restricted and low-resource tracks in the BEA-2019 Shared Task on Grammatical Error Correction.\",\"PeriodicalId\":363390,\"journal\":{\"name\":\"Workshop on Innovative Use of NLP for Building Educational Applications\",\"volume\":\"2016 18\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Innovative Use of NLP for Building Educational Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2020.bea-1.21\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Innovative Use of NLP for Building Educational Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2020.bea-1.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

语法错误纠正(GEC)是指纠正书面文本中的语法错误。当前的GEC系统，即那些利用统计和神经机器翻译的系统，需要大量带注释的训练数据，这些数据可能昂贵或难以获得。本研究比较了在BEA-2019语法错误纠正共享任务中，两个得分最高的提交物所使用的生成合成数据的技术，以及限制和低资源的轨道。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Comparative Study of Synthetic Data Generation Methods for Grammatical Error Correction

Grammatical Error Correction (GEC) is concerned with correcting grammatical errors in written text. Current GEC systems, namely those leveraging statistical and neural machine translation, require large quantities of annotated training data, which can be expensive or impractical to obtain. This research compares techniques for generating synthetic data utilized by the two highest scoring submissions to the restricted and low-resource tracks in the BEA-2019 Shared Task on Grammatical Error Correction.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop on Innovative Use of NLP for Building Educational Applications

自引率

0.00%

发文量