A Deep Learning model capable of producing heatmap probabilities for Characters in Natural Scenes.

Proceedings of the 2021 International Conference on Pattern Recognition and Intelligent Systems Pub Date : 2021-07-28 DOI:10.1145/3480651.3480662

Allen Joshey, Ashish Tiwari, Rakesh Sankar, Sahil Salim Makandar

{"title":"A Deep Learning model capable of producing heatmap probabilities for Characters in Natural Scenes.","authors":"Allen Joshey, Ashish Tiwari, Rakesh Sankar, Sahil Salim Makandar","doi":"10.1145/3480651.3480662","DOIUrl":null,"url":null,"abstract":"Text appearing in Natural settings come in all shapes, sizes and textures. Classical methods have often failed at extracting accurately the text present in naturally occurring scenes. Text appearing in the wild presents itself in forms of hierarchy organized as sentences, words and characters. Methods for detecting Text from everyday scenes of the real world have found success. Most real world datasets available are annotated on a word level or line level thereby limiting detection to words and not characters. Inspired by the works of Naver Labs on CRAFT [2] and Microsoft Research and Baidu Research's work on WordSup [5] by training models in a weakly supervised manner to gain character level predictions. We propose a computationally efficient architecture capable of providing similar results. Thus our model, once capable of producing character level annotation trained on Synthetic text can be used to fine tune for text appearing in natural settings. The methods discussed prove to be robust enough to identify text that could be curved or somewhat deformed appearing in natural settings. Our approach includes the generation of probabilities of the location of characters and the gaps between characters of which constitute a word, such that it becomes easier to localize characters and words. Our method goes to show comparable results as to CRAFT [2] with only 30% of the number of learnable parameters required.","PeriodicalId":305943,"journal":{"name":"Proceedings of the 2021 International Conference on Pattern Recognition and Intelligent Systems","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 International Conference on Pattern Recognition and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3480651.3480662","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Text appearing in Natural settings come in all shapes, sizes and textures. Classical methods have often failed at extracting accurately the text present in naturally occurring scenes. Text appearing in the wild presents itself in forms of hierarchy organized as sentences, words and characters. Methods for detecting Text from everyday scenes of the real world have found success. Most real world datasets available are annotated on a word level or line level thereby limiting detection to words and not characters. Inspired by the works of Naver Labs on CRAFT [2] and Microsoft Research and Baidu Research's work on WordSup [5] by training models in a weakly supervised manner to gain character level predictions. We propose a computationally efficient architecture capable of providing similar results. Thus our model, once capable of producing character level annotation trained on Synthetic text can be used to fine tune for text appearing in natural settings. The methods discussed prove to be robust enough to identify text that could be curved or somewhat deformed appearing in natural settings. Our approach includes the generation of probabilities of the location of characters and the gaps between characters of which constitute a word, such that it becomes easier to localize characters and words. Our method goes to show comparable results as to CRAFT [2] with only 30% of the number of learnable parameters required.

查看原文本刊更多论文

一个深度学习模型，能够为自然场景中的角色生成热图概率。

在自然环境中出现的文本有各种形状、大小和纹理。经典的方法往往不能准确地提取自然发生场景中的文本。在野外出现的文本以句子、单词和字符的层次结构形式呈现出来。从现实世界的日常场景中检测文本的方法已经取得了成功。大多数可用的真实世界数据集都是在单词级别或行级别上进行注释的，因此限制了对单词而不是字符的检测。受到Naver实验室在CRAFT[2]上的工作以及微软研究院和百度研究院在WordSup[5]上的工作的启发，以弱监督的方式训练模型以获得字符水平预测。我们提出了一种计算效率高的架构，能够提供类似的结果。因此，我们的模型一旦能够生成在合成文本上训练的字符级注释，就可以用于微调自然环境中出现的文本。所讨论的方法被证明是足够健壮的，可以识别在自然环境中出现的弯曲或有些变形的文本。我们的方法包括生成字符位置的概率和组成单词的字符之间的间隔，这样就更容易定位字符和单词。我们的方法只需要30%的可学习参数，就能显示出与CRAFT[2]相当的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2021 International Conference on Pattern Recognition and Intelligent Systems

自引率

0.00%

发文量