基于语言独立规则的印刷和手写文本分类

2015 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS) Pub Date : 2015-12-01 DOI:10.1109/EAIS.2015.7368806

T. Saba, A. Almazyad, A. Rehman

{"title":"基于语言独立规则的印刷和手写文本分类","authors":"T. Saba, A. Almazyad, A. Rehman","doi":"10.1109/EAIS.2015.7368806","DOIUrl":null,"url":null,"abstract":"Handwriting in data entry forms/documents usually indicates user's filled information that should be treated differently from the printed text. In Arab world, these filled information are normally in English or Arabic. Secondly, classification approaches are quite different for machine printed and script. Therefore, prior to segmentation & classification, text distinction into Printed & script entries is mandatory. In this research, the dilemma of the language independent text distinction in multilingual data entry forms is addressed. Our main focus is to distinguish the machine printed text and script in multilingual data entry forms that are language independent. The proposed approach explore new statistical and structural features of text lines to classify them into separate categories. Accordingly a set of classification rules is derived to explicitly differentiate machine printed and handwritten entries, written in any language. Additional, novelty of the proposed approach is that no training/training data is required rather text is discriminated on basis of simple rules. Promising experimental results with 90 % accuracy exhibit that proposed approach is simple and robust. Finally, the scheme is independent of language, style, size, and fonts that commonly co-exist in multilingual data entry forms.","PeriodicalId":325875,"journal":{"name":"2015 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Language independent rule based classification of printed & handwritten text\",\"authors\":\"T. Saba, A. Almazyad, A. Rehman\",\"doi\":\"10.1109/EAIS.2015.7368806\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Handwriting in data entry forms/documents usually indicates user's filled information that should be treated differently from the printed text. In Arab world, these filled information are normally in English or Arabic. Secondly, classification approaches are quite different for machine printed and script. Therefore, prior to segmentation & classification, text distinction into Printed & script entries is mandatory. In this research, the dilemma of the language independent text distinction in multilingual data entry forms is addressed. Our main focus is to distinguish the machine printed text and script in multilingual data entry forms that are language independent. The proposed approach explore new statistical and structural features of text lines to classify them into separate categories. Accordingly a set of classification rules is derived to explicitly differentiate machine printed and handwritten entries, written in any language. Additional, novelty of the proposed approach is that no training/training data is required rather text is discriminated on basis of simple rules. Promising experimental results with 90 % accuracy exhibit that proposed approach is simple and robust. Finally, the scheme is independent of language, style, size, and fonts that commonly co-exist in multilingual data entry forms.\",\"PeriodicalId\":325875,\"journal\":{\"name\":\"2015 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EAIS.2015.7368806\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EAIS.2015.7368806","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

摘要

数据输入表单/文档中的手写通常表示用户填写的信息，应与打印文本区别对待。在阿拉伯世界，这些填充信息通常是英语或阿拉伯语。其次，机器印刷和手写体的分类方法有很大的不同。因此，在分割和分类之前，必须将文本区分为印刷品和脚本条目。本研究解决了多语言数据录入表单中语言无关的文本区分问题。我们的主要重点是区分机器打印文本和脚本的多语言数据输入形式是独立的语言。该方法探索文本行新的统计和结构特征，将文本行划分为不同的类别。因此，导出了一组分类规则，以显式区分机器打印和手写的条目，以任何语言编写。此外，该方法的新颖之处在于不需要训练/训练数据，而是根据简单的规则对文本进行区分。实验结果表明，该方法简单、鲁棒性好，准确率高达90%。最后，该方案独立于多语言数据输入表单中通常共存的语言、样式、大小和字体。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Language independent rule based classification of printed & handwritten text

Handwriting in data entry forms/documents usually indicates user's filled information that should be treated differently from the printed text. In Arab world, these filled information are normally in English or Arabic. Secondly, classification approaches are quite different for machine printed and script. Therefore, prior to segmentation & classification, text distinction into Printed & script entries is mandatory. In this research, the dilemma of the language independent text distinction in multilingual data entry forms is addressed. Our main focus is to distinguish the machine printed text and script in multilingual data entry forms that are language independent. The proposed approach explore new statistical and structural features of text lines to classify them into separate categories. Accordingly a set of classification rules is derived to explicitly differentiate machine printed and handwritten entries, written in any language. Additional, novelty of the proposed approach is that no training/training data is required rather text is discriminated on basis of simple rules. Promising experimental results with 90 % accuracy exhibit that proposed approach is simple and robust. Finally, the scheme is independent of language, style, size, and fonts that commonly co-exist in multilingual data entry forms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS)

自引率

0.00%

发文量