Grammatical Error Correction: A Survey of the State of the Art

IF 5.3 2区计算机科学

Computational Linguistics Pub Date : 2023-09-01 DOI:10.1162/coli_a_00478

Christopher Bryant, Zheng Yuan, Muhammad Reza Qorib, Hannan Cao, Hwee Tou Ng, Ted Briscoe

{"title":"Grammatical Error Correction: A Survey of the State of the Art","authors":"Christopher Bryant, Zheng Yuan, Muhammad Reza Qorib, Hannan Cao, Hwee Tou Ng, Ted Briscoe","doi":"10.1162/coli_a_00478","DOIUrl":null,"url":null,"abstract":"<p>Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject–verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors, respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems, which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarize the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgments, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as a comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments.</p>","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"15 1","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Linguistics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1162/coli_a_00478","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 30

Abstract

Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject–verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors, respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems, which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarize the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgments, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as a comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments.

查看原文本刊更多论文

语法错误纠正:技术现状调查

语法纠错(GEC)是对文本中的错误进行自动检测和纠正的任务。这项任务不仅包括语法错误的纠正，如缺少介词和主动不匹配，还包括正字法和语义错误，如拼写错误和选词错误。该领域在过去十年中取得了重大进展，部分原因是一系列五个共享任务的推动，这些任务推动了基于规则的方法、统计分类器、统计机器翻译以及最后的神经机器翻译系统的发展，这些系统代表了当前的主导地位。在这篇调查报告中，我们将该领域浓缩成一篇文章，首先概述了该任务的一些语言挑战，介绍了研究人员可用的最流行的数据集(英语和其他语言)，并总结了各种方法和技术，这些方法和技术特别关注人工错误生成。接下来，我们描述了许多不同的评估方法，以及围绕度量可靠性的关注，特别是与主观的人类判断有关，然后总结了最近的进展，并对未来的工作和仍然存在的挑战提出了建议。我们希望这项调查将作为一个全面的资源，为研究人员谁是新的领域或谁想要随时了解最新的发展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational Linguistics Computer Science-Artificial Intelligence

自引率

0.00%

发文量

期刊介绍： Computational Linguistics is the longest-running publication devoted exclusively to the computational and mathematical properties of language and the design and analysis of natural language processing systems. This highly regarded quarterly offers university and industry linguists, computational linguists, artificial intelligence and machine learning investigators, cognitive scientists, speech specialists, and philosophers the latest information about the computational aspects of all the facets of research on language.