{"title":"基于T5模型的卡纳达语文本纠错","authors":"Sushmitha Ramaneedi, P. Pati","doi":"10.1109/I2CT57861.2023.10126228","DOIUrl":null,"url":null,"abstract":"Error creeps into text in various ways. Typing error may come due to either mis-typing or due to poor language expertise. Similarly, recognition technologies while converting textual images and speech into text may generate error due to their limitations. Irrespective of the channel of error induction, presence of error poses a huge challenge for downstream consumption of such textual content. Additionally, error present in Indian language textual documents come with their own set of issues. This necessitates focused study on the textual errors in Indian language documents and the various technologies which may be employed to eliminate them.This work proposes to employ mT5, a very popular deep learning based multi-lingual language model, to eliminate errors present in Kannada, an Indian Language, text. A pretrained model of mT5 is enhanced with transfer learning for a Kannada dataset. The ability of the enhanced mT5 model to reduce error is studied at various levels of noise. Character Error Rate (CER) is employed as the metric. It’s observed that the enhanced mT5 model is effectively able to reduce noise by 12% for input text with 25% CER.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Kannada Textual Error Correction Using T5 Model\",\"authors\":\"Sushmitha Ramaneedi, P. Pati\",\"doi\":\"10.1109/I2CT57861.2023.10126228\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Error creeps into text in various ways. Typing error may come due to either mis-typing or due to poor language expertise. Similarly, recognition technologies while converting textual images and speech into text may generate error due to their limitations. Irrespective of the channel of error induction, presence of error poses a huge challenge for downstream consumption of such textual content. Additionally, error present in Indian language textual documents come with their own set of issues. This necessitates focused study on the textual errors in Indian language documents and the various technologies which may be employed to eliminate them.This work proposes to employ mT5, a very popular deep learning based multi-lingual language model, to eliminate errors present in Kannada, an Indian Language, text. A pretrained model of mT5 is enhanced with transfer learning for a Kannada dataset. The ability of the enhanced mT5 model to reduce error is studied at various levels of noise. Character Error Rate (CER) is employed as the metric. It’s observed that the enhanced mT5 model is effectively able to reduce noise by 12% for input text with 25% CER.\",\"PeriodicalId\":150346,\"journal\":{\"name\":\"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/I2CT57861.2023.10126228\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/I2CT57861.2023.10126228","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Error creeps into text in various ways. Typing error may come due to either mis-typing or due to poor language expertise. Similarly, recognition technologies while converting textual images and speech into text may generate error due to their limitations. Irrespective of the channel of error induction, presence of error poses a huge challenge for downstream consumption of such textual content. Additionally, error present in Indian language textual documents come with their own set of issues. This necessitates focused study on the textual errors in Indian language documents and the various technologies which may be employed to eliminate them.This work proposes to employ mT5, a very popular deep learning based multi-lingual language model, to eliminate errors present in Kannada, an Indian Language, text. A pretrained model of mT5 is enhanced with transfer learning for a Kannada dataset. The ability of the enhanced mT5 model to reduce error is studied at various levels of noise. Character Error Rate (CER) is employed as the metric. It’s observed that the enhanced mT5 model is effectively able to reduce noise by 12% for input text with 25% CER.