Discrimination between Arabic and Latin from bilingual documents

2011 International Conference on Communications, Computing and Control Applications (CCCA) Pub Date : 2011-03-03 DOI:10.1109/CCCA.2011.6031496

Sofiene Haboubi, S. Maddouri, H. Amiri

引用次数: 27

Abstract

An important task in machine learning is the electronic reading of documents. In this process, discrimination between languages is one of the first steps in the problem of automatic document text recognition. We are interested in the processing of mixed Arabic/Latin printed documents. Our method is based essentially on the extraction of words. We first extract structural features of words and then recognize the writing language. We finally present the results of our classification approach and discuss possible improvements.

查看原文本刊更多论文

双语文献中阿拉伯语和拉丁语的区别

机器学习的一个重要任务是文档的电子阅读。在此过程中，语言识别是实现文档文本自动识别的首要步骤之一。我们对处理阿拉伯/拉丁混合印刷文件感兴趣。我们的方法基本上是基于单词的提取。我们首先提取单词的结构特征，然后识别书写语言。最后，我们介绍了我们的分类方法的结果，并讨论了可能的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 International Conference on Communications, Computing and Control Applications (CCCA)

自引率

0.00%

发文量