{"title":"Discrimination between Arabic and Latin from bilingual documents","authors":"Sofiene Haboubi, S. Maddouri, H. Amiri","doi":"10.1109/CCCA.2011.6031496","DOIUrl":null,"url":null,"abstract":"An important task in machine learning is the electronic reading of documents. In this process, discrimination between languages is one of the first steps in the problem of automatic document text recognition. We are interested in the processing of mixed Arabic/Latin printed documents. Our method is based essentially on the extraction of words. We first extract structural features of words and then recognize the writing language. We finally present the results of our classification approach and discuss possible improvements.","PeriodicalId":259067,"journal":{"name":"2011 International Conference on Communications, Computing and Control Applications (CCCA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Communications, Computing and Control Applications (CCCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCCA.2011.6031496","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 27
Abstract
An important task in machine learning is the electronic reading of documents. In this process, discrimination between languages is one of the first steps in the problem of automatic document text recognition. We are interested in the processing of mixed Arabic/Latin printed documents. Our method is based essentially on the extraction of words. We first extract structural features of words and then recognize the writing language. We finally present the results of our classification approach and discuss possible improvements.