一种基于四分位数的Odia文本线段分词方法

2022 5th International Conference on Computational Intelligence and Networks (CINE) Pub Date : 2022-12-01 DOI:10.1109/CINE56307.2022.10037532

Aradhana Kar, S. Pradhan

{"title":"一种基于四分位数的Odia文本线段分词方法","authors":"Aradhana Kar, S. Pradhan","doi":"10.1109/CINE56307.2022.10037532","DOIUrl":null,"url":null,"abstract":"This paper deals with word segmentation from a given line segment. These line segments may have alphabets and matras in one single line segment or the alphabets and matras of a line text in two different line segments. The line text is segmented into alphabets and the associated matras in two different line segments are reconstructed using Reconstruct Module. The approach introduced in this paper has three phases: Pre_Processing Module, Find_White_Spaces Module, and Analyse_White_Spaces Module. The Pre_Processing module is responsible for reading the input line segment, converting it to a gray image, removing white spaces that encapsulate the whole text, and then converting it to a binary image. The Find_White_Spaces module is responsible for finding the start and end of the white spaces between the words. The Analyse_White_Spaces module is responsible for analysing the widths of the white spaces using quartiles and storing the segmented words in the directory, ‘Segmented Words’. The proposed system has been tested with images of line segments consisting of only alphabets and alphabets with matras. The overall correctness accuracy of 99.9% has been achieved in this approach for word segmentation.","PeriodicalId":336238,"journal":{"name":"2022 5th International Conference on Computational Intelligence and Networks (CINE)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Approach for Word Segmentation from a Line Segment in Odia Text Using Quartiles\",\"authors\":\"Aradhana Kar, S. Pradhan\",\"doi\":\"10.1109/CINE56307.2022.10037532\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper deals with word segmentation from a given line segment. These line segments may have alphabets and matras in one single line segment or the alphabets and matras of a line text in two different line segments. The line text is segmented into alphabets and the associated matras in two different line segments are reconstructed using Reconstruct Module. The approach introduced in this paper has three phases: Pre_Processing Module, Find_White_Spaces Module, and Analyse_White_Spaces Module. The Pre_Processing module is responsible for reading the input line segment, converting it to a gray image, removing white spaces that encapsulate the whole text, and then converting it to a binary image. The Find_White_Spaces module is responsible for finding the start and end of the white spaces between the words. The Analyse_White_Spaces module is responsible for analysing the widths of the white spaces using quartiles and storing the segmented words in the directory, ‘Segmented Words’. The proposed system has been tested with images of line segments consisting of only alphabets and alphabets with matras. The overall correctness accuracy of 99.9% has been achieved in this approach for word segmentation.\",\"PeriodicalId\":336238,\"journal\":{\"name\":\"2022 5th International Conference on Computational Intelligence and Networks (CINE)\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 5th International Conference on Computational Intelligence and Networks (CINE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CINE56307.2022.10037532\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 5th International Conference on Computational Intelligence and Networks (CINE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CINE56307.2022.10037532","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文处理给定线段的分词问题。这些线段可以在单个线段中包含字母和符号，也可以在两个不同的线段中包含一行文本的字母和符号。将行文本分割成字母，并使用重构模块重构两个不同线段中的相关矩阵。本文介绍的方法分为三个阶段:Pre_Processing模块、Find_White_Spaces模块和Analyse_White_Spaces模块。Pre_Processing模块负责读取输入线段，将其转换为灰色图像，删除封装整个文本的空白，然后将其转换为二值图像。Find_White_Spaces模块负责查找单词之间空白的开始和结束。Analyse_White_Spaces模块负责使用四分位数分析空白的宽度，并将分割的单词存储在目录“segmented words”中。所提出的系统已经用仅由字母组成的线段图像和带matras的字母图像进行了测试。该方法在分词方面的总体正确率达到99.9%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Approach for Word Segmentation from a Line Segment in Odia Text Using Quartiles

This paper deals with word segmentation from a given line segment. These line segments may have alphabets and matras in one single line segment or the alphabets and matras of a line text in two different line segments. The line text is segmented into alphabets and the associated matras in two different line segments are reconstructed using Reconstruct Module. The approach introduced in this paper has three phases: Pre_Processing Module, Find_White_Spaces Module, and Analyse_White_Spaces Module. The Pre_Processing module is responsible for reading the input line segment, converting it to a gray image, removing white spaces that encapsulate the whole text, and then converting it to a binary image. The Find_White_Spaces module is responsible for finding the start and end of the white spaces between the words. The Analyse_White_Spaces module is responsible for analysing the widths of the white spaces using quartiles and storing the segmented words in the directory, ‘Segmented Words’. The proposed system has been tested with images of line segments consisting of only alphabets and alphabets with matras. The overall correctness accuracy of 99.9% has been achieved in this approach for word segmentation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 5th International Conference on Computational Intelligence and Networks (CINE)

自引率

0.00%

发文量