{"title":"使用LSTM网络识别印刷埃塞俄比亚文字","authors":"Direselign Addis, Chuan-Ming Liu, Van-Dai Ta","doi":"10.1109/ICSSE.2018.8519972","DOIUrl":null,"url":null,"abstract":"Bidirectional Long Short-Term Memory (LSTM) networks have brought tremendous results on many machine learning tasks including handwritten and machine printed character recognition systems. The Ethiopic script uses a large number of characters in the writing and existence of visually similar character, which results in a challenge for OCR development. In this paper, we present application of bidirectional LSTM neural networks to recognize machine printed Ethiopic scripts. To train and test the model, we collect text files from different source written in Amharic, Ge’ ez and Tigrigna language and generate 96,000 artificial text line images by applying different degradation techniques. Additionally, to test the model with real scanned documents, we use real 12 page scanned images from Tsenat book. Without using any language modeling and any other post-processing, LSTM networks attain an average character error rate of 2.12%, and this indicates the proposed network achieves a promising result.","PeriodicalId":431387,"journal":{"name":"2018 International Conference on System Science and Engineering (ICSSE)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Printed Ethiopic Script Recognition by Using LSTM Networks\",\"authors\":\"Direselign Addis, Chuan-Ming Liu, Van-Dai Ta\",\"doi\":\"10.1109/ICSSE.2018.8519972\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Bidirectional Long Short-Term Memory (LSTM) networks have brought tremendous results on many machine learning tasks including handwritten and machine printed character recognition systems. The Ethiopic script uses a large number of characters in the writing and existence of visually similar character, which results in a challenge for OCR development. In this paper, we present application of bidirectional LSTM neural networks to recognize machine printed Ethiopic scripts. To train and test the model, we collect text files from different source written in Amharic, Ge’ ez and Tigrigna language and generate 96,000 artificial text line images by applying different degradation techniques. Additionally, to test the model with real scanned documents, we use real 12 page scanned images from Tsenat book. Without using any language modeling and any other post-processing, LSTM networks attain an average character error rate of 2.12%, and this indicates the proposed network achieves a promising result.\",\"PeriodicalId\":431387,\"journal\":{\"name\":\"2018 International Conference on System Science and Engineering (ICSSE)\",\"volume\":\"113 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on System Science and Engineering (ICSSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSSE.2018.8519972\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on System Science and Engineering (ICSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSSE.2018.8519972","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Printed Ethiopic Script Recognition by Using LSTM Networks
Bidirectional Long Short-Term Memory (LSTM) networks have brought tremendous results on many machine learning tasks including handwritten and machine printed character recognition systems. The Ethiopic script uses a large number of characters in the writing and existence of visually similar character, which results in a challenge for OCR development. In this paper, we present application of bidirectional LSTM neural networks to recognize machine printed Ethiopic scripts. To train and test the model, we collect text files from different source written in Amharic, Ge’ ez and Tigrigna language and generate 96,000 artificial text line images by applying different degradation techniques. Additionally, to test the model with real scanned documents, we use real 12 page scanned images from Tsenat book. Without using any language modeling and any other post-processing, LSTM networks attain an average character error rate of 2.12%, and this indicates the proposed network achieves a promising result.