一种新的马来语手写文本行提取简化方法

2020 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA) Pub Date : 2020-07-01 DOI:10.1109/ACCTHPA49271.2020.9213218

P. V. Pearlsy, D. Sankar

{"title":"一种新的马来语手写文本行提取简化方法","authors":"P. V. Pearlsy, D. Sankar","doi":"10.1109/ACCTHPA49271.2020.9213218","DOIUrl":null,"url":null,"abstract":"This paper presents a novel and simple method for extracting individual lines from handwritten Malayalam documents.The challenge involved in text line extraction of handwritten document is segmentation of touching lines. As far as Malayalam language is considered, symbols like chandrakkala will be classified into separate line due to the small gap between the Malayalam alphabet and the symbol chandrakkala. This paper addresses the possibility of touching lines and misclassification of character like chandrakkala into a separate line. In the proposed method, the scanned handwritten document is divided into vertical stripes. Using horizontal projection method lines are extracted in each vertical stripe separately. Touching lines, segmentation of character like chandrakkala into separate line and extra lines due to noise are addressed using the median values of the height of lines in each vertical stripe separately. The handwritten document image is divided into vertical stripes prior to line segmentation to account for the possibility of skewed lines. When the document is divided into vertical stripe, the characters will be cut in between. This paper also presents a solution to join the characters cut in between when the document is divided into vertical stripes. This is done by compensating for the distance of characters from the top of the line at the joining edge of the vertical stripe.","PeriodicalId":191794,"journal":{"name":"2020 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Simplified Approach for Text Line Extraction of Handwritten Malayalam Document\",\"authors\":\"P. V. Pearlsy, D. Sankar\",\"doi\":\"10.1109/ACCTHPA49271.2020.9213218\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a novel and simple method for extracting individual lines from handwritten Malayalam documents.The challenge involved in text line extraction of handwritten document is segmentation of touching lines. As far as Malayalam language is considered, symbols like chandrakkala will be classified into separate line due to the small gap between the Malayalam alphabet and the symbol chandrakkala. This paper addresses the possibility of touching lines and misclassification of character like chandrakkala into a separate line. In the proposed method, the scanned handwritten document is divided into vertical stripes. Using horizontal projection method lines are extracted in each vertical stripe separately. Touching lines, segmentation of character like chandrakkala into separate line and extra lines due to noise are addressed using the median values of the height of lines in each vertical stripe separately. The handwritten document image is divided into vertical stripes prior to line segmentation to account for the possibility of skewed lines. When the document is divided into vertical stripe, the characters will be cut in between. This paper also presents a solution to join the characters cut in between when the document is divided into vertical stripes. This is done by compensating for the distance of characters from the top of the line at the joining edge of the vertical stripe.\",\"PeriodicalId\":191794,\"journal\":{\"name\":\"2020 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACCTHPA49271.2020.9213218\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACCTHPA49271.2020.9213218","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文提出了一种新颖而简单的从马拉雅拉姆语手写体文档中提取单行的方法。手写体文本线提取的难点在于触摸线的分割。就马拉雅拉姆语而言，由于马拉雅拉姆字母和符号chandrakkala之间的小差距，像chandrakkala这样的符号将被分类为单独的一行。本文讨论了像chandrakkala这样的字符的触线和误分类的可能性。在该方法中，扫描的手写文档被分割成垂直的条纹。采用水平投影法，在每个垂直条纹中分别提取直线。触摸线，像chandrakkala这样的字符分割为单独的线和由于噪声而产生的额外线，分别使用每个垂直条纹中线高度的中值来解决。在线分割之前，将手写文档图像划分为垂直条纹，以考虑歪斜线的可能性。当文档被分割成竖条时，字符会被切到中间。本文还提出了一种解决方案，当文件被分割成垂直的条纹时，将中间的字符连接起来。这是通过在垂直条纹的连接边缘补偿字符与行顶部的距离来完成的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Novel Simplified Approach for Text Line Extraction of Handwritten Malayalam Document

This paper presents a novel and simple method for extracting individual lines from handwritten Malayalam documents.The challenge involved in text line extraction of handwritten document is segmentation of touching lines. As far as Malayalam language is considered, symbols like chandrakkala will be classified into separate line due to the small gap between the Malayalam alphabet and the symbol chandrakkala. This paper addresses the possibility of touching lines and misclassification of character like chandrakkala into a separate line. In the proposed method, the scanned handwritten document is divided into vertical stripes. Using horizontal projection method lines are extracted in each vertical stripe separately. Touching lines, segmentation of character like chandrakkala into separate line and extra lines due to noise are addressed using the median values of the height of lines in each vertical stripe separately. The handwritten document image is divided into vertical stripes prior to line segmentation to account for the possibility of skewed lines. When the document is divided into vertical stripe, the characters will be cut in between. This paper also presents a solution to join the characters cut in between when the document is divided into vertical stripes. This is done by compensating for the distance of characters from the top of the line at the joining edge of the vertical stripe.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA)

自引率

0.00%

发文量