基于字符属性的复杂真实场景图像汉字字符检测

Lianli Xu, H. Nagayoshi, H. Sako
{"title":"基于字符属性的复杂真实场景图像汉字字符检测","authors":"Lianli Xu, H. Nagayoshi, H. Sako","doi":"10.1109/DAS.2008.34","DOIUrl":null,"url":null,"abstract":"Character recognition in complex real scene images is a very challenging undertaking. The most popular approach is to segment the text area using some extra pre-knowledge, such as \"characters are in a signboard'', etc. This approach makes it possible to construct a very time-consuming method, but generality is still a problem. In this paper, we propose a more general method by utilizing only character features. Our algorithm consists of five steps: pre-processing to extract connected components, initial classification using primitive rules, strong classification using AdaBoost, Markov random field (MRF) clustering to combine connected components with similar properties, and post-processing using optical character recognition (OCR) results. The results of experiments using 11 images containing 1691 characters (including characters in bad condition) indicated the effectiveness of the proposed system, namely, that 52.9% of characters were extracted correctly with 625 noise components extracted as characters.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Kanji Character Detection from Complex Real Scene Images based on Character Properties\",\"authors\":\"Lianli Xu, H. Nagayoshi, H. Sako\",\"doi\":\"10.1109/DAS.2008.34\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Character recognition in complex real scene images is a very challenging undertaking. The most popular approach is to segment the text area using some extra pre-knowledge, such as \\\"characters are in a signboard'', etc. This approach makes it possible to construct a very time-consuming method, but generality is still a problem. In this paper, we propose a more general method by utilizing only character features. Our algorithm consists of five steps: pre-processing to extract connected components, initial classification using primitive rules, strong classification using AdaBoost, Markov random field (MRF) clustering to combine connected components with similar properties, and post-processing using optical character recognition (OCR) results. The results of experiments using 11 images containing 1691 characters (including characters in bad condition) indicated the effectiveness of the proposed system, namely, that 52.9% of characters were extracted correctly with 625 noise components extracted as characters.\",\"PeriodicalId\":423207,\"journal\":{\"name\":\"2008 The Eighth IAPR International Workshop on Document Analysis Systems\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 The Eighth IAPR International Workshop on Document Analysis Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DAS.2008.34\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DAS.2008.34","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

摘要

复杂的真实场景图像中的字符识别是一项非常具有挑战性的工作。最流行的方法是使用一些额外的预先知识来分割文本区域,例如“字符位于广告牌中”等。这种方法使得构造一个非常耗时的方法成为可能,但是通用性仍然是一个问题。在本文中,我们提出了一种更通用的方法,即仅利用字符特征。我们的算法包括五个步骤:预处理提取连接成分,使用原始规则进行初始分类,使用AdaBoost进行强分类,使用马尔可夫随机场(MRF)聚类对具有相似属性的连接成分进行组合,以及使用光学字符识别(OCR)结果进行后处理。对11幅包含1691个字符(含不良字符)的图像进行了实验,结果表明了该系统的有效性,即以625个噪声分量作为字符提取的字符正确率为52.9%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Kanji Character Detection from Complex Real Scene Images based on Character Properties
Character recognition in complex real scene images is a very challenging undertaking. The most popular approach is to segment the text area using some extra pre-knowledge, such as "characters are in a signboard'', etc. This approach makes it possible to construct a very time-consuming method, but generality is still a problem. In this paper, we propose a more general method by utilizing only character features. Our algorithm consists of five steps: pre-processing to extract connected components, initial classification using primitive rules, strong classification using AdaBoost, Markov random field (MRF) clustering to combine connected components with similar properties, and post-processing using optical character recognition (OCR) results. The results of experiments using 11 images containing 1691 characters (including characters in bad condition) indicated the effectiveness of the proposed system, namely, that 52.9% of characters were extracted correctly with 625 noise components extracted as characters.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信