Denoising Sequence-to-Sequence Modeling for Removing Spelling Mistakes

Shuvendu Roy
{"title":"Denoising Sequence-to-Sequence Modeling for Removing Spelling Mistakes","authors":"Shuvendu Roy","doi":"10.1109/ICASERT.2019.8934902","DOIUrl":null,"url":null,"abstract":"Rule-based spelling correction system focused on finding the most matched word with the misspelled word. But this approach does not work well inside a sentence with multiple errors that has a combination of possible correct words to replace but only one current sentence. Replacing each word individually will result in errors. So, the spelling corrector system must understand the context of the sentence including the tense and gender of the subject and so on. The most popular example of typing mistake correction is the one Google provides in their search engine. It was introduced quite a while ago but no such good performing system is developed by anyone else. In this work, we have proposed a spelling correction system using deep learning. The basic intuition of our approach is taken from denoising autoencoder. Here we have trained the model with noisy input generated by changing, removing or adding extra character at random position inside the sequence. The job of the model is to model this noisy input to output the original errorless sequence. We have experimented with large English dataset and reported the performance in terms of character level accuracy. The proposed model has shown impressive results in correcting the spelling mistakes.","PeriodicalId":6613,"journal":{"name":"2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT)","volume":"58 1","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASERT.2019.8934902","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Rule-based spelling correction system focused on finding the most matched word with the misspelled word. But this approach does not work well inside a sentence with multiple errors that has a combination of possible correct words to replace but only one current sentence. Replacing each word individually will result in errors. So, the spelling corrector system must understand the context of the sentence including the tense and gender of the subject and so on. The most popular example of typing mistake correction is the one Google provides in their search engine. It was introduced quite a while ago but no such good performing system is developed by anyone else. In this work, we have proposed a spelling correction system using deep learning. The basic intuition of our approach is taken from denoising autoencoder. Here we have trained the model with noisy input generated by changing, removing or adding extra character at random position inside the sequence. The job of the model is to model this noisy input to output the original errorless sequence. We have experimented with large English dataset and reported the performance in terms of character level accuracy. The proposed model has shown impressive results in correcting the spelling mistakes.
消除拼写错误的序列到序列建模去噪
基于规则的拼写纠正系统侧重于找到与拼写错误最匹配的单词。但是,这种方法在一个有多个错误的句子中并不适用,这个句子只有一个当前句子,有可能替换正确的单词组合。单独替换每个单词会导致错误。因此,拼写校正系统必须了解句子的上下文,包括主语的时态和性别等。最流行的输入错误纠正的例子是谷歌在他们的搜索引擎中提供的。它是很久以前引入的,但没有其他人开发出如此出色的系统。在这项工作中,我们提出了一个使用深度学习的拼写纠正系统。我们的方法的基本直觉来自于去噪自编码器。在这里,我们使用通过在序列内的随机位置更改、删除或添加额外字符而产生的噪声输入来训练模型。模型的工作是对这个有噪声的输入进行建模,以输出原始的无误差序列。我们对大型英语数据集进行了实验,并报告了在字符级准确性方面的性能。所提出的模型在纠正拼写错误方面显示出令人印象深刻的效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信