{"title":"基于莱文斯坦距离的新型阿拉伯语光学字符识别方法","authors":"Walid Fakhet, Salim El Khediri, Salah Zidi","doi":"10.3103/S0146411624700639","DOIUrl":null,"url":null,"abstract":"<p>Arabic handwritten character recognition (AHCR) is the process of automatically identifying and recognizing handwritten Arabic characters. This is a challenging task due to the complexity of the Arabic script, which includes a large number of characters with complex shapes and ligatures. In this paper, we present a novel approach based on Levenshtein distance to recognize Arabic handwritten characters by combining the classification and the postprocessing phases. To train the proposed model, we created an Arabic optical character recognition (OCR) context database divided into multiple text files. Each file in the database belongs to one of five well-defined contexts: sport, economy, religion, politics, and culture. The total number of words in each file is 15 000. The experiment results show that the new method outperforms the state-of-the-art approach. The error rate achieved by using 15 000 words was 1.2%.</p>","PeriodicalId":46238,"journal":{"name":"AUTOMATIC CONTROL AND COMPUTER SCIENCES","volume":"58 5","pages":"519 - 529"},"PeriodicalIF":0.6000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Arabic Optical Character Recognition Approach Based on Levenshtein Distance\",\"authors\":\"Walid Fakhet, Salim El Khediri, Salah Zidi\",\"doi\":\"10.3103/S0146411624700639\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Arabic handwritten character recognition (AHCR) is the process of automatically identifying and recognizing handwritten Arabic characters. This is a challenging task due to the complexity of the Arabic script, which includes a large number of characters with complex shapes and ligatures. In this paper, we present a novel approach based on Levenshtein distance to recognize Arabic handwritten characters by combining the classification and the postprocessing phases. To train the proposed model, we created an Arabic optical character recognition (OCR) context database divided into multiple text files. Each file in the database belongs to one of five well-defined contexts: sport, economy, religion, politics, and culture. The total number of words in each file is 15 000. The experiment results show that the new method outperforms the state-of-the-art approach. The error rate achieved by using 15 000 words was 1.2%.</p>\",\"PeriodicalId\":46238,\"journal\":{\"name\":\"AUTOMATIC CONTROL AND COMPUTER SCIENCES\",\"volume\":\"58 5\",\"pages\":\"519 - 529\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2024-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AUTOMATIC CONTROL AND COMPUTER SCIENCES\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.3103/S0146411624700639\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AUTOMATIC CONTROL AND COMPUTER SCIENCES","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.3103/S0146411624700639","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
A Novel Arabic Optical Character Recognition Approach Based on Levenshtein Distance
Arabic handwritten character recognition (AHCR) is the process of automatically identifying and recognizing handwritten Arabic characters. This is a challenging task due to the complexity of the Arabic script, which includes a large number of characters with complex shapes and ligatures. In this paper, we present a novel approach based on Levenshtein distance to recognize Arabic handwritten characters by combining the classification and the postprocessing phases. To train the proposed model, we created an Arabic optical character recognition (OCR) context database divided into multiple text files. Each file in the database belongs to one of five well-defined contexts: sport, economy, religion, politics, and culture. The total number of words in each file is 15 000. The experiment results show that the new method outperforms the state-of-the-art approach. The error rate achieved by using 15 000 words was 1.2%.
期刊介绍:
Automatic Control and Computer Sciences is a peer reviewed journal that publishes articles on• Control systems, cyber-physical system, real-time systems, robotics, smart sensors, embedded intelligence • Network information technologies, information security, statistical methods of data processing, distributed artificial intelligence, complex systems modeling, knowledge representation, processing and management • Signal and image processing, machine learning, machine perception, computer vision