{"title":"Thai OCR error correction using genetic algorithm","authors":"B. Kruatrachue, K. Somguntar, K. Siriboon","doi":"10.1109/CW.2002.1180870","DOIUrl":null,"url":null,"abstract":"This paper presents an efficient method for Thai OCR error correction based on genetic algorithm (GA). The correction process starts with word graph construction from spell checking with dictionary, then a graph is searched for a corrected sentence with the highest perplexity (using language model, bi-gram and tri-gram) and word probability from OCR. For a long sentence, a search space is huge and can be resolved using GA. A list of nodes is used for chromosome encoding to represent all possible paths in a graph instead of standard binary string. The performance of the suggested technique is evaluated and compared to the full search for tested sentences of different size constructed from 10 nodes to 200 nodes word graphs.","PeriodicalId":376322,"journal":{"name":"First International Symposium on Cyber Worlds, 2002. Proceedings.","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"First International Symposium on Cyber Worlds, 2002. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CW.2002.1180870","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
This paper presents an efficient method for Thai OCR error correction based on genetic algorithm (GA). The correction process starts with word graph construction from spell checking with dictionary, then a graph is searched for a corrected sentence with the highest perplexity (using language model, bi-gram and tri-gram) and word probability from OCR. For a long sentence, a search space is huge and can be resolved using GA. A list of nodes is used for chromosome encoding to represent all possible paths in a graph instead of standard binary string. The performance of the suggested technique is evaluated and compared to the full search for tested sentences of different size constructed from 10 nodes to 200 nodes word graphs.