{"title":"Word Length-Aware Text Spotting: Enhancing Dense Text Detection and Recognition for Camera-Captured Document Image","authors":"Hao Wang;Huabing Zhou;Yanduo Zhang;Jiayi Ma;Haibin Ling","doi":"10.1109/TIM.2025.3560748","DOIUrl":null,"url":null,"abstract":"Text spotting in camera-captured document images faces significant challenges, especially with dense text of variable lengths. Existing approaches falter with the long-tailed distribution of word lengths, leading to decreased performance on words with extreme lengths. To address this issue, we present WordLenSpotter, an end-to-end framework incorporating word length awareness to improve detection and recognition across a wide range of word lengths. Our method utilizes a dilated convolutional fusion module in its image encoder and a transformer framework for joint detection and recognition guided by word length priors. Our innovations include a spatial length predictor (SLP) and a length-aware segmentation (LenSeg) proposal head, enhancing the model’s sensitivity to the spatial distribution of text. Evaluated on our newly constructed DSTD1500 dataset and existing public datasets with dense text, WordLenSpotter demonstrates superior text spotting capabilities, especially in handling the diversity of word lengths in dense text scenes. The code is available at <uri>https://github.com/unxiaohao/WordLenSpotter</uri>","PeriodicalId":13341,"journal":{"name":"IEEE Transactions on Instrumentation and Measurement","volume":"74 ","pages":"1-15"},"PeriodicalIF":5.6000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Instrumentation and Measurement","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10965826/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Text spotting in camera-captured document images faces significant challenges, especially with dense text of variable lengths. Existing approaches falter with the long-tailed distribution of word lengths, leading to decreased performance on words with extreme lengths. To address this issue, we present WordLenSpotter, an end-to-end framework incorporating word length awareness to improve detection and recognition across a wide range of word lengths. Our method utilizes a dilated convolutional fusion module in its image encoder and a transformer framework for joint detection and recognition guided by word length priors. Our innovations include a spatial length predictor (SLP) and a length-aware segmentation (LenSeg) proposal head, enhancing the model’s sensitivity to the spatial distribution of text. Evaluated on our newly constructed DSTD1500 dataset and existing public datasets with dense text, WordLenSpotter demonstrates superior text spotting capabilities, especially in handling the diversity of word lengths in dense text scenes. The code is available at https://github.com/unxiaohao/WordLenSpotter
期刊介绍:
Papers are sought that address innovative solutions to the development and use of electrical and electronic instruments and equipment to measure, monitor and/or record physical phenomena for the purpose of advancing measurement science, methods, functionality and applications. The scope of these papers may encompass: (1) theory, methodology, and practice of measurement; (2) design, development and evaluation of instrumentation and measurement systems and components used in generating, acquiring, conditioning and processing signals; (3) analysis, representation, display, and preservation of the information obtained from a set of measurements; and (4) scientific and technical support to establishment and maintenance of technical standards in the field of Instrumentation and Measurement.