{"title":"一种基于四分位数的Odia文本线段分词方法","authors":"Aradhana Kar, S. Pradhan","doi":"10.1109/CINE56307.2022.10037532","DOIUrl":null,"url":null,"abstract":"This paper deals with word segmentation from a given line segment. These line segments may have alphabets and matras in one single line segment or the alphabets and matras of a line text in two different line segments. The line text is segmented into alphabets and the associated matras in two different line segments are reconstructed using Reconstruct Module. The approach introduced in this paper has three phases: Pre_Processing Module, Find_White_Spaces Module, and Analyse_White_Spaces Module. The Pre_Processing module is responsible for reading the input line segment, converting it to a gray image, removing white spaces that encapsulate the whole text, and then converting it to a binary image. The Find_White_Spaces module is responsible for finding the start and end of the white spaces between the words. The Analyse_White_Spaces module is responsible for analysing the widths of the white spaces using quartiles and storing the segmented words in the directory, ‘Segmented Words’. The proposed system has been tested with images of line segments consisting of only alphabets and alphabets with matras. The overall correctness accuracy of 99.9% has been achieved in this approach for word segmentation.","PeriodicalId":336238,"journal":{"name":"2022 5th International Conference on Computational Intelligence and Networks (CINE)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Approach for Word Segmentation from a Line Segment in Odia Text Using Quartiles\",\"authors\":\"Aradhana Kar, S. Pradhan\",\"doi\":\"10.1109/CINE56307.2022.10037532\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper deals with word segmentation from a given line segment. These line segments may have alphabets and matras in one single line segment or the alphabets and matras of a line text in two different line segments. The line text is segmented into alphabets and the associated matras in two different line segments are reconstructed using Reconstruct Module. The approach introduced in this paper has three phases: Pre_Processing Module, Find_White_Spaces Module, and Analyse_White_Spaces Module. The Pre_Processing module is responsible for reading the input line segment, converting it to a gray image, removing white spaces that encapsulate the whole text, and then converting it to a binary image. The Find_White_Spaces module is responsible for finding the start and end of the white spaces between the words. The Analyse_White_Spaces module is responsible for analysing the widths of the white spaces using quartiles and storing the segmented words in the directory, ‘Segmented Words’. The proposed system has been tested with images of line segments consisting of only alphabets and alphabets with matras. The overall correctness accuracy of 99.9% has been achieved in this approach for word segmentation.\",\"PeriodicalId\":336238,\"journal\":{\"name\":\"2022 5th International Conference on Computational Intelligence and Networks (CINE)\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 5th International Conference on Computational Intelligence and Networks (CINE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CINE56307.2022.10037532\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 5th International Conference on Computational Intelligence and Networks (CINE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CINE56307.2022.10037532","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Approach for Word Segmentation from a Line Segment in Odia Text Using Quartiles
This paper deals with word segmentation from a given line segment. These line segments may have alphabets and matras in one single line segment or the alphabets and matras of a line text in two different line segments. The line text is segmented into alphabets and the associated matras in two different line segments are reconstructed using Reconstruct Module. The approach introduced in this paper has three phases: Pre_Processing Module, Find_White_Spaces Module, and Analyse_White_Spaces Module. The Pre_Processing module is responsible for reading the input line segment, converting it to a gray image, removing white spaces that encapsulate the whole text, and then converting it to a binary image. The Find_White_Spaces module is responsible for finding the start and end of the white spaces between the words. The Analyse_White_Spaces module is responsible for analysing the widths of the white spaces using quartiles and storing the segmented words in the directory, ‘Segmented Words’. The proposed system has been tested with images of line segments consisting of only alphabets and alphabets with matras. The overall correctness accuracy of 99.9% has been achieved in this approach for word segmentation.