P. P. A. Gayashan, K. Perera, G. D. Shashiwadana Nirmani, L. Ranathunga
{"title":"Old Sinhala Newspaper Article Segmentation for Content Recognition Using Image Processing","authors":"P. P. A. Gayashan, K. Perera, G. D. Shashiwadana Nirmani, L. Ranathunga","doi":"10.1109/fiti54902.2021.9833047","DOIUrl":null,"url":null,"abstract":"As an automation approach of the Old Newspaper digitization, the content segmentation plays a major role. This study segments the degraded and mediocre quality old Sinhala newspapers into separate articles together with main elements classification, character segmentation, feature extraction, and character recognition. As a remedial measure for the misspelled word generation, a word correction technique was introduced at the end of the process to improve the accuracy of the article digitization. This paper highlights the first step of this study, where newspaper page segmentation into separate articles through heuristic knowledge embedded approach is carried out. This approach includes image detection, line detection, text area identification, margin detection, and column separation of newspaper pages. The results of this research are intriguingly comparable to other existing literature.","PeriodicalId":201458,"journal":{"name":"2021 From Innovation To Impact (FITI)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 From Innovation To Impact (FITI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/fiti54902.2021.9833047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
As an automation approach of the Old Newspaper digitization, the content segmentation plays a major role. This study segments the degraded and mediocre quality old Sinhala newspapers into separate articles together with main elements classification, character segmentation, feature extraction, and character recognition. As a remedial measure for the misspelled word generation, a word correction technique was introduced at the end of the process to improve the accuracy of the article digitization. This paper highlights the first step of this study, where newspaper page segmentation into separate articles through heuristic knowledge embedded approach is carried out. This approach includes image detection, line detection, text area identification, margin detection, and column separation of newspaper pages. The results of this research are intriguingly comparable to other existing literature.