Allen Joshey, Ashish Tiwari, Rakesh Sankar, Sahil Salim Makandar
{"title":"A Deep Learning model capable of producing heatmap probabilities for Characters in Natural Scenes.","authors":"Allen Joshey, Ashish Tiwari, Rakesh Sankar, Sahil Salim Makandar","doi":"10.1145/3480651.3480662","DOIUrl":null,"url":null,"abstract":"Text appearing in Natural settings come in all shapes, sizes and textures. Classical methods have often failed at extracting accurately the text present in naturally occurring scenes. Text appearing in the wild presents itself in forms of hierarchy organized as sentences, words and characters. Methods for detecting Text from everyday scenes of the real world have found success. Most real world datasets available are annotated on a word level or line level thereby limiting detection to words and not characters. Inspired by the works of Naver Labs on CRAFT [2] and Microsoft Research and Baidu Research's work on WordSup [5] by training models in a weakly supervised manner to gain character level predictions. We propose a computationally efficient architecture capable of providing similar results. Thus our model, once capable of producing character level annotation trained on Synthetic text can be used to fine tune for text appearing in natural settings. The methods discussed prove to be robust enough to identify text that could be curved or somewhat deformed appearing in natural settings. Our approach includes the generation of probabilities of the location of characters and the gaps between characters of which constitute a word, such that it becomes easier to localize characters and words. Our method goes to show comparable results as to CRAFT [2] with only 30% of the number of learnable parameters required.","PeriodicalId":305943,"journal":{"name":"Proceedings of the 2021 International Conference on Pattern Recognition and Intelligent Systems","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 International Conference on Pattern Recognition and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3480651.3480662","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Text appearing in Natural settings come in all shapes, sizes and textures. Classical methods have often failed at extracting accurately the text present in naturally occurring scenes. Text appearing in the wild presents itself in forms of hierarchy organized as sentences, words and characters. Methods for detecting Text from everyday scenes of the real world have found success. Most real world datasets available are annotated on a word level or line level thereby limiting detection to words and not characters. Inspired by the works of Naver Labs on CRAFT [2] and Microsoft Research and Baidu Research's work on WordSup [5] by training models in a weakly supervised manner to gain character level predictions. We propose a computationally efficient architecture capable of providing similar results. Thus our model, once capable of producing character level annotation trained on Synthetic text can be used to fine tune for text appearing in natural settings. The methods discussed prove to be robust enough to identify text that could be curved or somewhat deformed appearing in natural settings. Our approach includes the generation of probabilities of the location of characters and the gaps between characters of which constitute a word, such that it becomes easier to localize characters and words. Our method goes to show comparable results as to CRAFT [2] with only 30% of the number of learnable parameters required.