一种基于四分位数的Odia文本线段分词方法

Aradhana Kar, S. Pradhan
{"title":"一种基于四分位数的Odia文本线段分词方法","authors":"Aradhana Kar, S. Pradhan","doi":"10.1109/CINE56307.2022.10037532","DOIUrl":null,"url":null,"abstract":"This paper deals with word segmentation from a given line segment. These line segments may have alphabets and matras in one single line segment or the alphabets and matras of a line text in two different line segments. The line text is segmented into alphabets and the associated matras in two different line segments are reconstructed using Reconstruct Module. The approach introduced in this paper has three phases: Pre_Processing Module, Find_White_Spaces Module, and Analyse_White_Spaces Module. The Pre_Processing module is responsible for reading the input line segment, converting it to a gray image, removing white spaces that encapsulate the whole text, and then converting it to a binary image. The Find_White_Spaces module is responsible for finding the start and end of the white spaces between the words. The Analyse_White_Spaces module is responsible for analysing the widths of the white spaces using quartiles and storing the segmented words in the directory, ‘Segmented Words’. The proposed system has been tested with images of line segments consisting of only alphabets and alphabets with matras. The overall correctness accuracy of 99.9% has been achieved in this approach for word segmentation.","PeriodicalId":336238,"journal":{"name":"2022 5th International Conference on Computational Intelligence and Networks (CINE)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Approach for Word Segmentation from a Line Segment in Odia Text Using Quartiles\",\"authors\":\"Aradhana Kar, S. Pradhan\",\"doi\":\"10.1109/CINE56307.2022.10037532\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper deals with word segmentation from a given line segment. These line segments may have alphabets and matras in one single line segment or the alphabets and matras of a line text in two different line segments. The line text is segmented into alphabets and the associated matras in two different line segments are reconstructed using Reconstruct Module. The approach introduced in this paper has three phases: Pre_Processing Module, Find_White_Spaces Module, and Analyse_White_Spaces Module. The Pre_Processing module is responsible for reading the input line segment, converting it to a gray image, removing white spaces that encapsulate the whole text, and then converting it to a binary image. The Find_White_Spaces module is responsible for finding the start and end of the white spaces between the words. The Analyse_White_Spaces module is responsible for analysing the widths of the white spaces using quartiles and storing the segmented words in the directory, ‘Segmented Words’. The proposed system has been tested with images of line segments consisting of only alphabets and alphabets with matras. The overall correctness accuracy of 99.9% has been achieved in this approach for word segmentation.\",\"PeriodicalId\":336238,\"journal\":{\"name\":\"2022 5th International Conference on Computational Intelligence and Networks (CINE)\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 5th International Conference on Computational Intelligence and Networks (CINE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CINE56307.2022.10037532\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 5th International Conference on Computational Intelligence and Networks (CINE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CINE56307.2022.10037532","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文处理给定线段的分词问题。这些线段可以在单个线段中包含字母和符号,也可以在两个不同的线段中包含一行文本的字母和符号。将行文本分割成字母,并使用重构模块重构两个不同线段中的相关矩阵。本文介绍的方法分为三个阶段:Pre_Processing模块、Find_White_Spaces模块和Analyse_White_Spaces模块。Pre_Processing模块负责读取输入线段,将其转换为灰色图像,删除封装整个文本的空白,然后将其转换为二值图像。Find_White_Spaces模块负责查找单词之间空白的开始和结束。Analyse_White_Spaces模块负责使用四分位数分析空白的宽度,并将分割的单词存储在目录“segmented words”中。所提出的系统已经用仅由字母组成的线段图像和带matras的字母图像进行了测试。该方法在分词方面的总体正确率达到99.9%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Approach for Word Segmentation from a Line Segment in Odia Text Using Quartiles
This paper deals with word segmentation from a given line segment. These line segments may have alphabets and matras in one single line segment or the alphabets and matras of a line text in two different line segments. The line text is segmented into alphabets and the associated matras in two different line segments are reconstructed using Reconstruct Module. The approach introduced in this paper has three phases: Pre_Processing Module, Find_White_Spaces Module, and Analyse_White_Spaces Module. The Pre_Processing module is responsible for reading the input line segment, converting it to a gray image, removing white spaces that encapsulate the whole text, and then converting it to a binary image. The Find_White_Spaces module is responsible for finding the start and end of the white spaces between the words. The Analyse_White_Spaces module is responsible for analysing the widths of the white spaces using quartiles and storing the segmented words in the directory, ‘Segmented Words’. The proposed system has been tested with images of line segments consisting of only alphabets and alphabets with matras. The overall correctness accuracy of 99.9% has been achieved in this approach for word segmentation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信