Michael Zingerenko, Elena Limonova, Vladimir V. Arlazarov
{"title":"Template-based text field segmentation for ID documents using dynamic squeezeboxes packing","authors":"Michael Zingerenko, Elena Limonova, Vladimir V. Arlazarov","doi":"10.1007/s11042-024-20162-6","DOIUrl":null,"url":null,"abstract":"<p>In this paper, we focus on the problem of text field segmentation in identity documents. These documents, characterized by their fixed layouts, present an opportunity to apply computationally efficient template-based algorithms. We consider the Dynamic Squeezeboxes Packing method and demonstrate its integration into document recognition systems, utilizing a single sample per document type. We benchmark text field segmentation on the MIDV-2019 public dataset using standard intersection-over-union and our custom intersection-over-template metrics, while also measuring processing time. We demonstrate that Dynamic Squeezeboxes Packing maintains competitive quality compared to text in the wild methods (EAST, CRAFT) and named-entity recognition method (LayoutLMv2). A significant advantage of this method is its processing speed, averaging 9 ms per image on the x86_64 platform, which is substantially faster than EAST (980 ms), CRAFT (2030 ms), and LayoutLMv2 (2210 ms). The obtained results suggest that the considered method has strong potential as a method in document image analysis, particularly for processing identity documents.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Tools and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11042-024-20162-6","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we focus on the problem of text field segmentation in identity documents. These documents, characterized by their fixed layouts, present an opportunity to apply computationally efficient template-based algorithms. We consider the Dynamic Squeezeboxes Packing method and demonstrate its integration into document recognition systems, utilizing a single sample per document type. We benchmark text field segmentation on the MIDV-2019 public dataset using standard intersection-over-union and our custom intersection-over-template metrics, while also measuring processing time. We demonstrate that Dynamic Squeezeboxes Packing maintains competitive quality compared to text in the wild methods (EAST, CRAFT) and named-entity recognition method (LayoutLMv2). A significant advantage of this method is its processing speed, averaging 9 ms per image on the x86_64 platform, which is substantially faster than EAST (980 ms), CRAFT (2030 ms), and LayoutLMv2 (2210 ms). The obtained results suggest that the considered method has strong potential as a method in document image analysis, particularly for processing identity documents.
期刊介绍:
Multimedia Tools and Applications publishes original research articles on multimedia development and system support tools as well as case studies of multimedia applications. It also features experimental and survey articles. The journal is intended for academics, practitioners, scientists and engineers who are involved in multimedia system research, design and applications. All papers are peer reviewed.
Specific areas of interest include:
- Multimedia Tools:
- Multimedia Applications:
- Prototype multimedia systems and platforms