{"title":"基于字典匹配的日文OCR后处理方法","authors":"C. Guo, Yuanyan Tang, Changsong Liu, Jia Duan","doi":"10.1109/ICWAPR.2013.6599286","DOIUrl":null,"url":null,"abstract":"This paper describes a post-processing approach for Japanese character recognition based on dictionary. By the analysis of experimental data in the processing of OCR, we find that some segmentation and recognition results do not conform to the rules of lexical and just generate the character based on the shape. If the fonts of pending recognized characters are similar with the others, it will easily lead to going wrong in the processing of OCR. For these errors we put forward an idea based on the Limited Length Segmentation Matching and the Bayesian Statistical Classifier. Through the above method, most of the font recognized mistakes can be solved. By the experimental results, it can be proved that this method is an effective way to improve the recognized rate of Japanese character.","PeriodicalId":236156,"journal":{"name":"2013 International Conference on Wavelet Analysis and Pattern Recognition","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Japanese OCR post-processing approach based on dictionary matching\",\"authors\":\"C. Guo, Yuanyan Tang, Changsong Liu, Jia Duan\",\"doi\":\"10.1109/ICWAPR.2013.6599286\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes a post-processing approach for Japanese character recognition based on dictionary. By the analysis of experimental data in the processing of OCR, we find that some segmentation and recognition results do not conform to the rules of lexical and just generate the character based on the shape. If the fonts of pending recognized characters are similar with the others, it will easily lead to going wrong in the processing of OCR. For these errors we put forward an idea based on the Limited Length Segmentation Matching and the Bayesian Statistical Classifier. Through the above method, most of the font recognized mistakes can be solved. By the experimental results, it can be proved that this method is an effective way to improve the recognized rate of Japanese character.\",\"PeriodicalId\":236156,\"journal\":{\"name\":\"2013 International Conference on Wavelet Analysis and Pattern Recognition\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-07-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Wavelet Analysis and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICWAPR.2013.6599286\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Wavelet Analysis and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWAPR.2013.6599286","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Japanese OCR post-processing approach based on dictionary matching
This paper describes a post-processing approach for Japanese character recognition based on dictionary. By the analysis of experimental data in the processing of OCR, we find that some segmentation and recognition results do not conform to the rules of lexical and just generate the character based on the shape. If the fonts of pending recognized characters are similar with the others, it will easily lead to going wrong in the processing of OCR. For these errors we put forward an idea based on the Limited Length Segmentation Matching and the Bayesian Statistical Classifier. Through the above method, most of the font recognized mistakes can be solved. By the experimental results, it can be proved that this method is an effective way to improve the recognized rate of Japanese character.