{"title":"一种新的马来语手写文本行提取简化方法","authors":"P. V. Pearlsy, D. Sankar","doi":"10.1109/ACCTHPA49271.2020.9213218","DOIUrl":null,"url":null,"abstract":"This paper presents a novel and simple method for extracting individual lines from handwritten Malayalam documents.The challenge involved in text line extraction of handwritten document is segmentation of touching lines. As far as Malayalam language is considered, symbols like chandrakkala will be classified into separate line due to the small gap between the Malayalam alphabet and the symbol chandrakkala. This paper addresses the possibility of touching lines and misclassification of character like chandrakkala into a separate line. In the proposed method, the scanned handwritten document is divided into vertical stripes. Using horizontal projection method lines are extracted in each vertical stripe separately. Touching lines, segmentation of character like chandrakkala into separate line and extra lines due to noise are addressed using the median values of the height of lines in each vertical stripe separately. The handwritten document image is divided into vertical stripes prior to line segmentation to account for the possibility of skewed lines. When the document is divided into vertical stripe, the characters will be cut in between. This paper also presents a solution to join the characters cut in between when the document is divided into vertical stripes. This is done by compensating for the distance of characters from the top of the line at the joining edge of the vertical stripe.","PeriodicalId":191794,"journal":{"name":"2020 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Simplified Approach for Text Line Extraction of Handwritten Malayalam Document\",\"authors\":\"P. V. Pearlsy, D. Sankar\",\"doi\":\"10.1109/ACCTHPA49271.2020.9213218\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a novel and simple method for extracting individual lines from handwritten Malayalam documents.The challenge involved in text line extraction of handwritten document is segmentation of touching lines. As far as Malayalam language is considered, symbols like chandrakkala will be classified into separate line due to the small gap between the Malayalam alphabet and the symbol chandrakkala. This paper addresses the possibility of touching lines and misclassification of character like chandrakkala into a separate line. In the proposed method, the scanned handwritten document is divided into vertical stripes. Using horizontal projection method lines are extracted in each vertical stripe separately. Touching lines, segmentation of character like chandrakkala into separate line and extra lines due to noise are addressed using the median values of the height of lines in each vertical stripe separately. The handwritten document image is divided into vertical stripes prior to line segmentation to account for the possibility of skewed lines. When the document is divided into vertical stripe, the characters will be cut in between. This paper also presents a solution to join the characters cut in between when the document is divided into vertical stripes. This is done by compensating for the distance of characters from the top of the line at the joining edge of the vertical stripe.\",\"PeriodicalId\":191794,\"journal\":{\"name\":\"2020 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACCTHPA49271.2020.9213218\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACCTHPA49271.2020.9213218","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Novel Simplified Approach for Text Line Extraction of Handwritten Malayalam Document
This paper presents a novel and simple method for extracting individual lines from handwritten Malayalam documents.The challenge involved in text line extraction of handwritten document is segmentation of touching lines. As far as Malayalam language is considered, symbols like chandrakkala will be classified into separate line due to the small gap between the Malayalam alphabet and the symbol chandrakkala. This paper addresses the possibility of touching lines and misclassification of character like chandrakkala into a separate line. In the proposed method, the scanned handwritten document is divided into vertical stripes. Using horizontal projection method lines are extracted in each vertical stripe separately. Touching lines, segmentation of character like chandrakkala into separate line and extra lines due to noise are addressed using the median values of the height of lines in each vertical stripe separately. The handwritten document image is divided into vertical stripes prior to line segmentation to account for the possibility of skewed lines. When the document is divided into vertical stripe, the characters will be cut in between. This paper also presents a solution to join the characters cut in between when the document is divided into vertical stripes. This is done by compensating for the distance of characters from the top of the line at the joining edge of the vertical stripe.