Jhonatas S. Conceição, A. Pinto, L. G. L. Decker, Jose Luis Flores Campana, Manuel Alberto Cordova Neira, Andrezza A. Dos Santos, H. Pedrini, R. Torres
{"title":"基于特定语言卷积神经网络的多语言文本定位","authors":"Jhonatas S. Conceição, A. Pinto, L. G. L. Decker, Jose Luis Flores Campana, Manuel Alberto Cordova Neira, Andrezza A. Dos Santos, H. Pedrini, R. Torres","doi":"10.5753/sibgrapi.est.2019.8333","DOIUrl":null,"url":null,"abstract":"Scene text localization and recognition is a topic in computer vision that aims to delimit candidate regions in an input image containing incidental scene text elements. The challenge of this research consists in devising detectors capable of dealing with a wide range of variability, such as font size, font style, color, complex background, text in different languages, among others. This work presents a comparison between two strategies of building classification models, based on a Convolution Neural Network method, to detect textual elements in multiple languages in images: (i) classification model built on a multi-lingual training scenario; and (ii) classification model built on a language-specific training scenario. The experiments designed in this work indicate that language-specific model outperforms the classification model trained over a multi-lingual scenario, with an improvement of 14.79%, 8.94%, and 11.43%, in terms of precision, recall, and F-measure values, respectively.","PeriodicalId":119031,"journal":{"name":"Anais Estendidos da Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Lingual Text Localization via Language-Specific Convolutional Neural Networks\",\"authors\":\"Jhonatas S. Conceição, A. Pinto, L. G. L. Decker, Jose Luis Flores Campana, Manuel Alberto Cordova Neira, Andrezza A. Dos Santos, H. Pedrini, R. Torres\",\"doi\":\"10.5753/sibgrapi.est.2019.8333\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Scene text localization and recognition is a topic in computer vision that aims to delimit candidate regions in an input image containing incidental scene text elements. The challenge of this research consists in devising detectors capable of dealing with a wide range of variability, such as font size, font style, color, complex background, text in different languages, among others. This work presents a comparison between two strategies of building classification models, based on a Convolution Neural Network method, to detect textual elements in multiple languages in images: (i) classification model built on a multi-lingual training scenario; and (ii) classification model built on a language-specific training scenario. The experiments designed in this work indicate that language-specific model outperforms the classification model trained over a multi-lingual scenario, with an improvement of 14.79%, 8.94%, and 11.43%, in terms of precision, recall, and F-measure values, respectively.\",\"PeriodicalId\":119031,\"journal\":{\"name\":\"Anais Estendidos da Conference on Graphics, Patterns and Images (SIBGRAPI)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Anais Estendidos da Conference on Graphics, Patterns and Images (SIBGRAPI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5753/sibgrapi.est.2019.8333\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais Estendidos da Conference on Graphics, Patterns and Images (SIBGRAPI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/sibgrapi.est.2019.8333","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-Lingual Text Localization via Language-Specific Convolutional Neural Networks
Scene text localization and recognition is a topic in computer vision that aims to delimit candidate regions in an input image containing incidental scene text elements. The challenge of this research consists in devising detectors capable of dealing with a wide range of variability, such as font size, font style, color, complex background, text in different languages, among others. This work presents a comparison between two strategies of building classification models, based on a Convolution Neural Network method, to detect textual elements in multiple languages in images: (i) classification model built on a multi-lingual training scenario; and (ii) classification model built on a language-specific training scenario. The experiments designed in this work indicate that language-specific model outperforms the classification model trained over a multi-lingual scenario, with an improvement of 14.79%, 8.94%, and 11.43%, in terms of precision, recall, and F-measure values, respectively.