{"title":"W-A net: Leveraging Atrous and Deformable Convolutions for Efficient Text Detection","authors":"Sukhad Anand, Z. Khan","doi":"10.1109/DICTA51227.2020.9363428","DOIUrl":null,"url":null,"abstract":"Scene text detection has been gaining a lots of focus in research. Even though the recent methods are able to detect text in complex background having complex shapes with a fairly good accuracy, they still suffer from issues of limited receptive field. These fail from detecting extremely short or long words hence failing in detecting text words precisely in document text images. We propose a new model which we call W-A net, because of it's W shape with the middle branch being Atrous convolutional layers. Our model predicts a segmentation map which divides the image into word and no word regions and also, a boundary map which helps to segregate closer words from each other. We use Atrous convolutions and Deformable convolutional layers to increase the receptive field which helps to detect long words in an image. We treat text detection problem as a single problem irrespective of the background, making our model suitable of detecting text in scene or document images. We present our findings on two scene text datasets and a receipt dataset. Our results show that our method performs better than recent scene text detection methods which perform poorly on document text images, especially receipt images with short words.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Digital Image Computing: Techniques and Applications (DICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DICTA51227.2020.9363428","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Scene text detection has been gaining a lots of focus in research. Even though the recent methods are able to detect text in complex background having complex shapes with a fairly good accuracy, they still suffer from issues of limited receptive field. These fail from detecting extremely short or long words hence failing in detecting text words precisely in document text images. We propose a new model which we call W-A net, because of it's W shape with the middle branch being Atrous convolutional layers. Our model predicts a segmentation map which divides the image into word and no word regions and also, a boundary map which helps to segregate closer words from each other. We use Atrous convolutions and Deformable convolutional layers to increase the receptive field which helps to detect long words in an image. We treat text detection problem as a single problem irrespective of the background, making our model suitable of detecting text in scene or document images. We present our findings on two scene text datasets and a receipt dataset. Our results show that our method performs better than recent scene text detection methods which perform poorly on document text images, especially receipt images with short words.