{"title":"手写体双语文档中合并行分割与文字识别","authors":"Ranjana S. Zinjore, R. Ramteke, Varsha M. Pathak","doi":"10.1145/3158354.3158360","DOIUrl":null,"url":null,"abstract":"Text line segmentation is a challenging task in Optical Character Recognition, due to writing style of writers and touching characters or Matra between lines. In this paper, we have proposed an algorithm for dividing the merged lines into individual multiple lines from Handwritten Bilingual (Marathi-English) documents. The algorithm is tested on different images; we have obtained promising results. Afterward, script is identifying at word level using fusion of moment based features and visual discriminating features. Two different classifiers are evaluated on a dataset consisting of 242 Marathi-English words for training and 82 words for testing. We have received average identification accuracy of 67% in K-NN classifier and 80.14% in SVM classifier.","PeriodicalId":306212,"journal":{"name":"Proceedings of the 9th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Segmentation of Merged Lines and Script Identification in Handwritten Bilingual Documents\",\"authors\":\"Ranjana S. Zinjore, R. Ramteke, Varsha M. Pathak\",\"doi\":\"10.1145/3158354.3158360\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text line segmentation is a challenging task in Optical Character Recognition, due to writing style of writers and touching characters or Matra between lines. In this paper, we have proposed an algorithm for dividing the merged lines into individual multiple lines from Handwritten Bilingual (Marathi-English) documents. The algorithm is tested on different images; we have obtained promising results. Afterward, script is identifying at word level using fusion of moment based features and visual discriminating features. Two different classifiers are evaluated on a dataset consisting of 242 Marathi-English words for training and 82 words for testing. We have received average identification accuracy of 67% in K-NN classifier and 80.14% in SVM classifier.\",\"PeriodicalId\":306212,\"journal\":{\"name\":\"Proceedings of the 9th Annual Meeting of the Forum for Information Retrieval Evaluation\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 9th Annual Meeting of the Forum for Information Retrieval Evaluation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3158354.3158360\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th Annual Meeting of the Forum for Information Retrieval Evaluation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3158354.3158360","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Segmentation of Merged Lines and Script Identification in Handwritten Bilingual Documents
Text line segmentation is a challenging task in Optical Character Recognition, due to writing style of writers and touching characters or Matra between lines. In this paper, we have proposed an algorithm for dividing the merged lines into individual multiple lines from Handwritten Bilingual (Marathi-English) documents. The algorithm is tested on different images; we have obtained promising results. Afterward, script is identifying at word level using fusion of moment based features and visual discriminating features. Two different classifiers are evaluated on a dataset consisting of 242 Marathi-English words for training and 82 words for testing. We have received average identification accuracy of 67% in K-NN classifier and 80.14% in SVM classifier.