一种新的马来语手写文本行提取简化方法

P. V. Pearlsy, D. Sankar
{"title":"一种新的马来语手写文本行提取简化方法","authors":"P. V. Pearlsy, D. Sankar","doi":"10.1109/ACCTHPA49271.2020.9213218","DOIUrl":null,"url":null,"abstract":"This paper presents a novel and simple method for extracting individual lines from handwritten Malayalam documents.The challenge involved in text line extraction of handwritten document is segmentation of touching lines. As far as Malayalam language is considered, symbols like chandrakkala will be classified into separate line due to the small gap between the Malayalam alphabet and the symbol chandrakkala. This paper addresses the possibility of touching lines and misclassification of character like chandrakkala into a separate line. In the proposed method, the scanned handwritten document is divided into vertical stripes. Using horizontal projection method lines are extracted in each vertical stripe separately. Touching lines, segmentation of character like chandrakkala into separate line and extra lines due to noise are addressed using the median values of the height of lines in each vertical stripe separately. The handwritten document image is divided into vertical stripes prior to line segmentation to account for the possibility of skewed lines. When the document is divided into vertical stripe, the characters will be cut in between. This paper also presents a solution to join the characters cut in between when the document is divided into vertical stripes. This is done by compensating for the distance of characters from the top of the line at the joining edge of the vertical stripe.","PeriodicalId":191794,"journal":{"name":"2020 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Simplified Approach for Text Line Extraction of Handwritten Malayalam Document\",\"authors\":\"P. V. Pearlsy, D. Sankar\",\"doi\":\"10.1109/ACCTHPA49271.2020.9213218\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a novel and simple method for extracting individual lines from handwritten Malayalam documents.The challenge involved in text line extraction of handwritten document is segmentation of touching lines. As far as Malayalam language is considered, symbols like chandrakkala will be classified into separate line due to the small gap between the Malayalam alphabet and the symbol chandrakkala. This paper addresses the possibility of touching lines and misclassification of character like chandrakkala into a separate line. In the proposed method, the scanned handwritten document is divided into vertical stripes. Using horizontal projection method lines are extracted in each vertical stripe separately. Touching lines, segmentation of character like chandrakkala into separate line and extra lines due to noise are addressed using the median values of the height of lines in each vertical stripe separately. The handwritten document image is divided into vertical stripes prior to line segmentation to account for the possibility of skewed lines. When the document is divided into vertical stripe, the characters will be cut in between. This paper also presents a solution to join the characters cut in between when the document is divided into vertical stripes. This is done by compensating for the distance of characters from the top of the line at the joining edge of the vertical stripe.\",\"PeriodicalId\":191794,\"journal\":{\"name\":\"2020 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACCTHPA49271.2020.9213218\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACCTHPA49271.2020.9213218","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文提出了一种新颖而简单的从马拉雅拉姆语手写体文档中提取单行的方法。手写体文本线提取的难点在于触摸线的分割。就马拉雅拉姆语而言,由于马拉雅拉姆字母和符号chandrakkala之间的小差距,像chandrakkala这样的符号将被分类为单独的一行。本文讨论了像chandrakkala这样的字符的触线和误分类的可能性。在该方法中,扫描的手写文档被分割成垂直的条纹。采用水平投影法,在每个垂直条纹中分别提取直线。触摸线,像chandrakkala这样的字符分割为单独的线和由于噪声而产生的额外线,分别使用每个垂直条纹中线高度的中值来解决。在线分割之前,将手写文档图像划分为垂直条纹,以考虑歪斜线的可能性。当文档被分割成竖条时,字符会被切到中间。本文还提出了一种解决方案,当文件被分割成垂直的条纹时,将中间的字符连接起来。这是通过在垂直条纹的连接边缘补偿字符与行顶部的距离来完成的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Novel Simplified Approach for Text Line Extraction of Handwritten Malayalam Document
This paper presents a novel and simple method for extracting individual lines from handwritten Malayalam documents.The challenge involved in text line extraction of handwritten document is segmentation of touching lines. As far as Malayalam language is considered, symbols like chandrakkala will be classified into separate line due to the small gap between the Malayalam alphabet and the symbol chandrakkala. This paper addresses the possibility of touching lines and misclassification of character like chandrakkala into a separate line. In the proposed method, the scanned handwritten document is divided into vertical stripes. Using horizontal projection method lines are extracted in each vertical stripe separately. Touching lines, segmentation of character like chandrakkala into separate line and extra lines due to noise are addressed using the median values of the height of lines in each vertical stripe separately. The handwritten document image is divided into vertical stripes prior to line segmentation to account for the possibility of skewed lines. When the document is divided into vertical stripe, the characters will be cut in between. This paper also presents a solution to join the characters cut in between when the document is divided into vertical stripes. This is done by compensating for the distance of characters from the top of the line at the joining edge of the vertical stripe.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信