Language independent rule based classification of printed & handwritten text

T. Saba, A. Almazyad, A. Rehman
{"title":"Language independent rule based classification of printed & handwritten text","authors":"T. Saba, A. Almazyad, A. Rehman","doi":"10.1109/EAIS.2015.7368806","DOIUrl":null,"url":null,"abstract":"Handwriting in data entry forms/documents usually indicates user's filled information that should be treated differently from the printed text. In Arab world, these filled information are normally in English or Arabic. Secondly, classification approaches are quite different for machine printed and script. Therefore, prior to segmentation & classification, text distinction into Printed & script entries is mandatory. In this research, the dilemma of the language independent text distinction in multilingual data entry forms is addressed. Our main focus is to distinguish the machine printed text and script in multilingual data entry forms that are language independent. The proposed approach explore new statistical and structural features of text lines to classify them into separate categories. Accordingly a set of classification rules is derived to explicitly differentiate machine printed and handwritten entries, written in any language. Additional, novelty of the proposed approach is that no training/training data is required rather text is discriminated on basis of simple rules. Promising experimental results with 90 % accuracy exhibit that proposed approach is simple and robust. Finally, the scheme is independent of language, style, size, and fonts that commonly co-exist in multilingual data entry forms.","PeriodicalId":325875,"journal":{"name":"2015 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EAIS.2015.7368806","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22

Abstract

Handwriting in data entry forms/documents usually indicates user's filled information that should be treated differently from the printed text. In Arab world, these filled information are normally in English or Arabic. Secondly, classification approaches are quite different for machine printed and script. Therefore, prior to segmentation & classification, text distinction into Printed & script entries is mandatory. In this research, the dilemma of the language independent text distinction in multilingual data entry forms is addressed. Our main focus is to distinguish the machine printed text and script in multilingual data entry forms that are language independent. The proposed approach explore new statistical and structural features of text lines to classify them into separate categories. Accordingly a set of classification rules is derived to explicitly differentiate machine printed and handwritten entries, written in any language. Additional, novelty of the proposed approach is that no training/training data is required rather text is discriminated on basis of simple rules. Promising experimental results with 90 % accuracy exhibit that proposed approach is simple and robust. Finally, the scheme is independent of language, style, size, and fonts that commonly co-exist in multilingual data entry forms.
基于语言独立规则的印刷和手写文本分类
数据输入表单/文档中的手写通常表示用户填写的信息,应与打印文本区别对待。在阿拉伯世界,这些填充信息通常是英语或阿拉伯语。其次,机器印刷和手写体的分类方法有很大的不同。因此,在分割和分类之前,必须将文本区分为印刷品和脚本条目。本研究解决了多语言数据录入表单中语言无关的文本区分问题。我们的主要重点是区分机器打印文本和脚本的多语言数据输入形式是独立的语言。该方法探索文本行新的统计和结构特征,将文本行划分为不同的类别。因此,导出了一组分类规则,以显式区分机器打印和手写的条目,以任何语言编写。此外,该方法的新颖之处在于不需要训练/训练数据,而是根据简单的规则对文本进行区分。实验结果表明,该方法简单、鲁棒性好,准确率高达90%。最后,该方案独立于多语言数据输入表单中通常共存的语言、样式、大小和字体。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信