{"title":"Heterogeneous models ensemble for Chinese grammatical error correction","authors":"Yeling Liang, Lin Li","doi":"10.1117/12.2667512","DOIUrl":null,"url":null,"abstract":"Grammatical error correction (GEC) aims to automatically identify and correct grammatical errors in a sentence. Neural machine translation (NMT) models are the mainstream approaches for the GEC task. However, the models require a large amount of data to be adequately trained, the variety of grammatical errors and the dependencies between errors in a sentence make it difficult for a single NMT model to correct multiple errors at once. In the work, we propose an ensemble approach for heterogeneous models, which integrates rule-based, NMT, and pre-trained language model-based GEC models through the recurrent generation approach, the approach can exploit the strengths of each model and cover a wider range of errors in a sentence. We also mitigate the scarcity of task-specific data for the GEC task through the data augmentation approach. We conduct extensive experiments on the NLPCC2018 shared task dataset to demonstrate the effectiveness of our proposed methods, and reaches the F0.5 value of 37.26, outperforming the best model in the shared task.","PeriodicalId":137914,"journal":{"name":"International Conference on Artificial Intelligence, Virtual Reality, and Visualization","volume":"205 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Artificial Intelligence, Virtual Reality, and Visualization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2667512","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Grammatical error correction (GEC) aims to automatically identify and correct grammatical errors in a sentence. Neural machine translation (NMT) models are the mainstream approaches for the GEC task. However, the models require a large amount of data to be adequately trained, the variety of grammatical errors and the dependencies between errors in a sentence make it difficult for a single NMT model to correct multiple errors at once. In the work, we propose an ensemble approach for heterogeneous models, which integrates rule-based, NMT, and pre-trained language model-based GEC models through the recurrent generation approach, the approach can exploit the strengths of each model and cover a wider range of errors in a sentence. We also mitigate the scarcity of task-specific data for the GEC task through the data augmentation approach. We conduct extensive experiments on the NLPCC2018 shared task dataset to demonstrate the effectiveness of our proposed methods, and reaches the F0.5 value of 37.26, outperforming the best model in the shared task.