{"title":"Page Object Detection in Vietnamese Document Images with Novel Approach","authors":"Luc T. Le, Trong-Thuan Nguyen, Khang Nguyen","doi":"10.1109/NICS56915.2022.10013374","DOIUrl":null,"url":null,"abstract":"We witnessed the rising popularity of Vietnamese documents on online platforms. Digitized Vietnamese documents (e.g., administrative text, scientific papers, textbooks, etc.) are available online. As a result, we need algorithms that can understand documents. Vietnamese is one of the most difficult languages with the Latin alphabet with additional accent symbols and derivative characters. Moreover, we still struggle with challenges arising from external and internal factors. This requires a good enough detector model as the foundation for extracting information tasks. In this research, we address page object detection in Vietnamese document images. We explore the performance of the UIT-DODV-Ext dataset, the largest Vietnamese document image dataset that includes scientific papers and textbooks. Additionally, we leverage the state-of-the-art object detector and then propose CasGRoIENet to improve the performance of the UIT-DODV-Ext dataset. CasGRoIENet achieves 75.9% mAP which is 2.3% higher than state-of-the-art results.","PeriodicalId":381028,"journal":{"name":"2022 9th NAFOSTED Conference on Information and Computer Science (NICS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 9th NAFOSTED Conference on Information and Computer Science (NICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NICS56915.2022.10013374","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We witnessed the rising popularity of Vietnamese documents on online platforms. Digitized Vietnamese documents (e.g., administrative text, scientific papers, textbooks, etc.) are available online. As a result, we need algorithms that can understand documents. Vietnamese is one of the most difficult languages with the Latin alphabet with additional accent symbols and derivative characters. Moreover, we still struggle with challenges arising from external and internal factors. This requires a good enough detector model as the foundation for extracting information tasks. In this research, we address page object detection in Vietnamese document images. We explore the performance of the UIT-DODV-Ext dataset, the largest Vietnamese document image dataset that includes scientific papers and textbooks. Additionally, we leverage the state-of-the-art object detector and then propose CasGRoIENet to improve the performance of the UIT-DODV-Ext dataset. CasGRoIENet achieves 75.9% mAP which is 2.3% higher than state-of-the-art results.