Edge Approximation Text Detector

IF 11.1 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-04-07 DOI:10.1109/TCSVT.2025.3558634

Chuang Yang;Xu Han;Tao Han;Han Han;Bingxuan Zhao;Qi Wang

{"title":"Edge Approximation Text Detector","authors":"Chuang Yang;Xu Han;Tao Han;Han Han;Bingxuan Zhao;Qi Wang","doi":"10.1109/TCSVT.2025.3558634","DOIUrl":null,"url":null,"abstract":"Pursuing efficient text shape representations helps scene text detection models focus on compact foreground regions and optimize the contour reconstruction steps to simplify the whole detection pipeline. Current approaches either represent irregular shapes via box-to-polygon strategy or decomposing a contour into pieces for fitting gradually, the deficiency of coarse contours or complex pipelines always exists in these models. Considering the above issues, we introduce <italic>EdgeText</i> to fit text contours compactly while alleviating excessive contour rebuilding processes. Concretely, it is observed that the two long edges of texts can be regarded as smooth curves. It allows us to build contours via continuous and smooth edges that cover text regions tightly instead of fitting piecewise, which helps avoid the two limitations in current models. Inspired by this observation, EdgeText formulates the text representation as the edge approximation problem via parameterized curve fitting functions. In the inference stage, our model starts with locating text centers, and then creating curve functions for approximating text edges relying on the points. Meanwhile, truncation points are determined based on the location features. In the end, extracting curve segments from curve functions by using the pixel coordinate information brought by truncation points to reconstruct text contours. Furthermore, considering the deep dependency of EdgeText on text edges, a bilateral enhanced perception (BEP) module is designed. It encourages our model to pay attention to the recognition of edge features. Additionally, to accelerate the learning of the curve function parameters, we introduce a proportional integral loss (PI-loss) to force the proposed model to focus on the curve distribution and avoid being disturbed by text scales. Ablation experiments demonstrate that EdgeText can fit scene texts compactly and naturally. Comparisons show that EdgeText is superior to existing methods on multiple public datasets. Code is available at <uri>https://github.com/omtcyang/EdgeTD</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9234-9245"},"PeriodicalIF":11.1000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10955237/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Pursuing efficient text shape representations helps scene text detection models focus on compact foreground regions and optimize the contour reconstruction steps to simplify the whole detection pipeline. Current approaches either represent irregular shapes via box-to-polygon strategy or decomposing a contour into pieces for fitting gradually, the deficiency of coarse contours or complex pipelines always exists in these models. Considering the above issues, we introduce EdgeText to fit text contours compactly while alleviating excessive contour rebuilding processes. Concretely, it is observed that the two long edges of texts can be regarded as smooth curves. It allows us to build contours via continuous and smooth edges that cover text regions tightly instead of fitting piecewise, which helps avoid the two limitations in current models. Inspired by this observation, EdgeText formulates the text representation as the edge approximation problem via parameterized curve fitting functions. In the inference stage, our model starts with locating text centers, and then creating curve functions for approximating text edges relying on the points. Meanwhile, truncation points are determined based on the location features. In the end, extracting curve segments from curve functions by using the pixel coordinate information brought by truncation points to reconstruct text contours. Furthermore, considering the deep dependency of EdgeText on text edges, a bilateral enhanced perception (BEP) module is designed. It encourages our model to pay attention to the recognition of edge features. Additionally, to accelerate the learning of the curve function parameters, we introduce a proportional integral loss (PI-loss) to force the proposed model to focus on the curve distribution and avoid being disturbed by text scales. Ablation experiments demonstrate that EdgeText can fit scene texts compactly and naturally. Comparisons show that EdgeText is superior to existing methods on multiple public datasets. Code is available at https://github.com/omtcyang/EdgeTD.

查看原文本刊更多论文

边缘逼近文本检测器

追求高效的文本形状表示有助于场景文本检测模型专注于紧凑的前景区域，并优化轮廓重建步骤，从而简化整个检测流程。目前的方法要么是通过盒多边形的策略来表示不规则形状，要么是将轮廓分解成小块逐步拟合，这些模型往往存在轮廓粗糙或管道复杂的缺点。考虑到上述问题，我们引入了EdgeText来紧凑地拟合文本轮廓，同时减轻了过多的轮廓重建过程。具体来看，文本的两条长边可以看作是光滑的曲线。它允许我们通过连续和光滑的边缘来构建轮廓，这些边缘紧密地覆盖文本区域，而不是分段拟合，这有助于避免当前模型中的两个限制。受此启发，EdgeText通过参数化曲线拟合函数将文本表示表述为边缘逼近问题。在推理阶段，我们的模型首先定位文本中心，然后根据这些点创建近似文本边缘的曲线函数。同时，根据位置特征确定截断点。最后，利用截断点带来的像素坐标信息，从曲线函数中提取曲线段，重建文本轮廓。此外，考虑到EdgeText对文本边缘的深度依赖，设计了双边增强感知（BEP）模块。它鼓励我们的模型关注边缘特征的识别。此外，为了加速曲线函数参数的学习，我们引入了比例积分损失（PI-loss），以迫使所提出的模型专注于曲线分布，避免受到文本尺度的干扰。烧蚀实验表明，EdgeText能够紧凑、自然地拟合场景文本。比较表明，EdgeText在多个公共数据集上优于现有方法。代码可从https://github.com/omtcyang/EdgeTD获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.