{"title":"具有边界和像素感知模块的实时工业文本检测","authors":"Yize Yang , Mingdi Hu , Jianxun Yu , Bingyi Jing","doi":"10.1016/j.displa.2025.102973","DOIUrl":null,"url":null,"abstract":"<div><div>Industrial character images often exhibit challenges such as reflective surfaces, similar characters, tilt, and faint imprints due to complex industrial environments. Despite this, few text detection algorithms have been specifically designed to handle these difficult characteristics, limiting the effectiveness of industrial intelligent management, logistics, and related applications. To address these challenges, we propose a real-time industrial text detection algorithm enhanced with boundary- and pixel-aware submodules, named RITD. RITD enhances the model’s ability to learn discriminative local features and the implicit relationships between text structures by introducing the boundary-aware and pixel-aware submodules, significantly improving its capability to handle text in complex scenes. In the boundary-aware submodule, we designed an innovative multi-level semantic information fusion method to accurately capture structural details of text boundaries. Meanwhile, in the pixel-aware submodule, we proposed a novel pixel-normalized attention mechanism and spatial attention mechanism, effectively directing the model’s focus on fine-grained boundary features. Our model was trained and evaluated on the MPSC industrial dataset and the ICDAR2015 natural scene dataset, achieving F-measures of 85.66% and 87.6%, respectively, representing the highest performance in text detection while maintaining exceptional detection speed. The codes of this study are openly available at <span><span>https://github.com/mendy-2013</span><svg><path></path></svg></span> once the article is published.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102973"},"PeriodicalIF":3.7000,"publicationDate":"2025-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RITD: Real-time industrial text detection with boundary- and pixel-aware modules\",\"authors\":\"Yize Yang , Mingdi Hu , Jianxun Yu , Bingyi Jing\",\"doi\":\"10.1016/j.displa.2025.102973\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Industrial character images often exhibit challenges such as reflective surfaces, similar characters, tilt, and faint imprints due to complex industrial environments. Despite this, few text detection algorithms have been specifically designed to handle these difficult characteristics, limiting the effectiveness of industrial intelligent management, logistics, and related applications. To address these challenges, we propose a real-time industrial text detection algorithm enhanced with boundary- and pixel-aware submodules, named RITD. RITD enhances the model’s ability to learn discriminative local features and the implicit relationships between text structures by introducing the boundary-aware and pixel-aware submodules, significantly improving its capability to handle text in complex scenes. In the boundary-aware submodule, we designed an innovative multi-level semantic information fusion method to accurately capture structural details of text boundaries. Meanwhile, in the pixel-aware submodule, we proposed a novel pixel-normalized attention mechanism and spatial attention mechanism, effectively directing the model’s focus on fine-grained boundary features. Our model was trained and evaluated on the MPSC industrial dataset and the ICDAR2015 natural scene dataset, achieving F-measures of 85.66% and 87.6%, respectively, representing the highest performance in text detection while maintaining exceptional detection speed. The codes of this study are openly available at <span><span>https://github.com/mendy-2013</span><svg><path></path></svg></span> once the article is published.</div></div>\",\"PeriodicalId\":50570,\"journal\":{\"name\":\"Displays\",\"volume\":\"87 \",\"pages\":\"Article 102973\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-01-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Displays\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0141938225000101\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225000101","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
RITD: Real-time industrial text detection with boundary- and pixel-aware modules
Industrial character images often exhibit challenges such as reflective surfaces, similar characters, tilt, and faint imprints due to complex industrial environments. Despite this, few text detection algorithms have been specifically designed to handle these difficult characteristics, limiting the effectiveness of industrial intelligent management, logistics, and related applications. To address these challenges, we propose a real-time industrial text detection algorithm enhanced with boundary- and pixel-aware submodules, named RITD. RITD enhances the model’s ability to learn discriminative local features and the implicit relationships between text structures by introducing the boundary-aware and pixel-aware submodules, significantly improving its capability to handle text in complex scenes. In the boundary-aware submodule, we designed an innovative multi-level semantic information fusion method to accurately capture structural details of text boundaries. Meanwhile, in the pixel-aware submodule, we proposed a novel pixel-normalized attention mechanism and spatial attention mechanism, effectively directing the model’s focus on fine-grained boundary features. Our model was trained and evaluated on the MPSC industrial dataset and the ICDAR2015 natural scene dataset, achieving F-measures of 85.66% and 87.6%, respectively, representing the highest performance in text detection while maintaining exceptional detection speed. The codes of this study are openly available at https://github.com/mendy-2013 once the article is published.
期刊介绍:
Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface.
Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.