{"title":"A novel domain independent scene text localizer","authors":"","doi":"10.1016/j.patcog.2024.111015","DOIUrl":null,"url":null,"abstract":"<div><p>Text localization across multiple domains is crucial for applications like autonomous driving and tracking marathon runners. This work introduces DIPCYT, a novel model that utilizes Domain Independent Partial Convolution and a Yolov5-based Transformer for text localization in scene images from various domains, including natural scenes, underwater, and drone images. Each domain presents unique challenges: underwater images suffer from poor quality and degradation, drone images suffer from tiny text and loss of shapes, and scene images suffer from arbitrarily oriented, shaped text. Additionally, license plates in drone images may not provide rich semantic information compared to other text types due to loss of contextual information between characters. To tackle these challenges, DIPCYT employs new partial convolution layers within Yolov5 and integrates Transformer detection heads with a novel Fourier Positional Convolutional Block Attention Module (FPCBAM). This approach leverages common text properties across domains, such as contextual (global) and spatial (local) relationships. Experimental results demonstrate that DIPCYT outperforms existing methods, achieving F-scores of 0.90, 0.90, 0.77, 0.85, 0.85, and 0.88 on Total-Text, ICDAR 2015, ICDAR 2019 MLT, CTW1500, Drone, and Underwater datasets, respectively.</p></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0031320324007660/pdfft?md5=8fb3ca2322db34e892039de8413439e0&pid=1-s2.0-S0031320324007660-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324007660","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Text localization across multiple domains is crucial for applications like autonomous driving and tracking marathon runners. This work introduces DIPCYT, a novel model that utilizes Domain Independent Partial Convolution and a Yolov5-based Transformer for text localization in scene images from various domains, including natural scenes, underwater, and drone images. Each domain presents unique challenges: underwater images suffer from poor quality and degradation, drone images suffer from tiny text and loss of shapes, and scene images suffer from arbitrarily oriented, shaped text. Additionally, license plates in drone images may not provide rich semantic information compared to other text types due to loss of contextual information between characters. To tackle these challenges, DIPCYT employs new partial convolution layers within Yolov5 and integrates Transformer detection heads with a novel Fourier Positional Convolutional Block Attention Module (FPCBAM). This approach leverages common text properties across domains, such as contextual (global) and spatial (local) relationships. Experimental results demonstrate that DIPCYT outperforms existing methods, achieving F-scores of 0.90, 0.90, 0.77, 0.85, 0.85, and 0.88 on Total-Text, ICDAR 2015, ICDAR 2019 MLT, CTW1500, Drone, and Underwater datasets, respectively.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.