Ivan Dorkic, Matteo Brisinello, R. Grbić, M. Herceg
{"title":"自然图像中像素级标注质量对文本检测性能的影响","authors":"Ivan Dorkic, Matteo Brisinello, R. Grbić, M. Herceg","doi":"10.23919/MIPRO57284.2023.10159759","DOIUrl":null,"url":null,"abstract":"Text detection in natural images is a task that arises in many computer vision applications. State-of-the-art text detection methods are mainly based on deep neural networks designed for instance segmentation task. However, most of the available datasets for text detection do not have fine annotations at the pixel level which are required during supervised learning of such networks. Usually, a whole or reduced text bounding box is used as a segmentation mask. In this paper, a method that generates a synthetic dataset with precise annotations at the pixel level is proposed. The method is based on the available Synthtext script for generating synthetic datasets with text instances. By creating synthetic datasets with precise and coarse annotations at the pixel level we explore the efficiency of the state-of-the-art text detector TextFuseNet.","PeriodicalId":177983,"journal":{"name":"2023 46th MIPRO ICT and Electronics Convention (MIPRO)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Influence of quality of pixel level annotations on text detection performance in natural images\",\"authors\":\"Ivan Dorkic, Matteo Brisinello, R. Grbić, M. Herceg\",\"doi\":\"10.23919/MIPRO57284.2023.10159759\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text detection in natural images is a task that arises in many computer vision applications. State-of-the-art text detection methods are mainly based on deep neural networks designed for instance segmentation task. However, most of the available datasets for text detection do not have fine annotations at the pixel level which are required during supervised learning of such networks. Usually, a whole or reduced text bounding box is used as a segmentation mask. In this paper, a method that generates a synthetic dataset with precise annotations at the pixel level is proposed. The method is based on the available Synthtext script for generating synthetic datasets with text instances. By creating synthetic datasets with precise and coarse annotations at the pixel level we explore the efficiency of the state-of-the-art text detector TextFuseNet.\",\"PeriodicalId\":177983,\"journal\":{\"name\":\"2023 46th MIPRO ICT and Electronics Convention (MIPRO)\",\"volume\":\"136 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 46th MIPRO ICT and Electronics Convention (MIPRO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/MIPRO57284.2023.10159759\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 46th MIPRO ICT and Electronics Convention (MIPRO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/MIPRO57284.2023.10159759","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Influence of quality of pixel level annotations on text detection performance in natural images
Text detection in natural images is a task that arises in many computer vision applications. State-of-the-art text detection methods are mainly based on deep neural networks designed for instance segmentation task. However, most of the available datasets for text detection do not have fine annotations at the pixel level which are required during supervised learning of such networks. Usually, a whole or reduced text bounding box is used as a segmentation mask. In this paper, a method that generates a synthetic dataset with precise annotations at the pixel level is proposed. The method is based on the available Synthtext script for generating synthetic datasets with text instances. By creating synthetic datasets with precise and coarse annotations at the pixel level we explore the efficiency of the state-of-the-art text detector TextFuseNet.