用俄语手写文本生成图像

A. Bogatenkova, O.V. Belyaeva, A. Perminov
{"title":"用俄语手写文本生成图像","authors":"A. Bogatenkova, O.V. Belyaeva, A. Perminov","doi":"10.15514/ispras-2023-35(2)-2","DOIUrl":null,"url":null,"abstract":"Automatic handwriting recognition is an important component in the process of electronic documents analysis, but its solution is still far from ideal. One of the main reasons for the complexity of Russian handwriting recognition is the insufficient amount of data used to train recognition models. Moreover, for the Russian language the problem is more acute and is exacerbated by a large variety of complex handwriting. This paper explores the impact of various methods of generating additional training datasets on the quality of recognition models: the method based on handwritten fonts, the StackMix method of gluing words from symbols, and the use of a generative adversarial network. A font-based method for creating images of handwritten text in Russian has been developed and described in this work. In addition, an algorithm for the formation of a new Cyrillic handwritten font based on the existing images of handwritten characters is proposed. The effectiveness of the developed method was tested using experiments that were carried out on two publicly available Cyrillic datasets using two different recognition models. The results of the experiments showed that the developed method for generating images made it possible to increase the accuracy of handwriting recognition by an average of 6%, which is comparable to the results of other more complex methods. The source code of the experiments, the proposed method, as well as the datasets generated during the experiments are posted in the public domain and are ready for download.","PeriodicalId":33459,"journal":{"name":"Trudy Instituta sistemnogo programmirovaniia RAN","volume":"34 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Generation of Images with Handwritten Text in Russian\",\"authors\":\"A. Bogatenkova, O.V. Belyaeva, A. Perminov\",\"doi\":\"10.15514/ispras-2023-35(2)-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic handwriting recognition is an important component in the process of electronic documents analysis, but its solution is still far from ideal. One of the main reasons for the complexity of Russian handwriting recognition is the insufficient amount of data used to train recognition models. Moreover, for the Russian language the problem is more acute and is exacerbated by a large variety of complex handwriting. This paper explores the impact of various methods of generating additional training datasets on the quality of recognition models: the method based on handwritten fonts, the StackMix method of gluing words from symbols, and the use of a generative adversarial network. A font-based method for creating images of handwritten text in Russian has been developed and described in this work. In addition, an algorithm for the formation of a new Cyrillic handwritten font based on the existing images of handwritten characters is proposed. The effectiveness of the developed method was tested using experiments that were carried out on two publicly available Cyrillic datasets using two different recognition models. The results of the experiments showed that the developed method for generating images made it possible to increase the accuracy of handwriting recognition by an average of 6%, which is comparable to the results of other more complex methods. The source code of the experiments, the proposed method, as well as the datasets generated during the experiments are posted in the public domain and are ready for download.\",\"PeriodicalId\":33459,\"journal\":{\"name\":\"Trudy Instituta sistemnogo programmirovaniia RAN\",\"volume\":\"34 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Trudy Instituta sistemnogo programmirovaniia RAN\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15514/ispras-2023-35(2)-2\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Trudy Instituta sistemnogo programmirovaniia RAN","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15514/ispras-2023-35(2)-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

自动手写识别是电子文档分析过程中的一个重要组成部分,但其解决方案还很不理想。俄语手写识别复杂的主要原因之一是用于训练识别模型的数据量不足。此外,对于俄语来说,这个问题更为严重,并且由于各种复杂的笔迹而加剧。本文探讨了生成额外训练数据集的各种方法对识别模型质量的影响:基于手写字体的方法,从符号中粘合单词的StackMix方法,以及生成对抗网络的使用。一种基于字体的方法来创建俄罗斯手写文本的图像已经开发和描述了这项工作。此外,提出了一种基于现有手写体图像生成新的西里尔手写字体的算法。使用两种不同的识别模型,在两个公开可用的西里尔语数据集上进行了实验,测试了所开发方法的有效性。实验结果表明,所开发的生成图像的方法可以使手写识别的准确率平均提高6%,这与其他更复杂的方法的结果相当。实验的源代码,提出的方法,以及实验过程中产生的数据集都发布在公共领域,并准备下载。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Generation of Images with Handwritten Text in Russian
Automatic handwriting recognition is an important component in the process of electronic documents analysis, but its solution is still far from ideal. One of the main reasons for the complexity of Russian handwriting recognition is the insufficient amount of data used to train recognition models. Moreover, for the Russian language the problem is more acute and is exacerbated by a large variety of complex handwriting. This paper explores the impact of various methods of generating additional training datasets on the quality of recognition models: the method based on handwritten fonts, the StackMix method of gluing words from symbols, and the use of a generative adversarial network. A font-based method for creating images of handwritten text in Russian has been developed and described in this work. In addition, an algorithm for the formation of a new Cyrillic handwritten font based on the existing images of handwritten characters is proposed. The effectiveness of the developed method was tested using experiments that were carried out on two publicly available Cyrillic datasets using two different recognition models. The results of the experiments showed that the developed method for generating images made it possible to increase the accuracy of handwriting recognition by an average of 6%, which is comparable to the results of other more complex methods. The source code of the experiments, the proposed method, as well as the datasets generated during the experiments are posted in the public domain and are ready for download.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
18
审稿时长
4 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信