Maitham A. Al-Dobais, F. Alrasheed, Ghazanfar Latif, Loay Alzubaidi
{"title":"基于自适应阈值法和几何特征的阿拉伯文扫描图书物理布局分析","authors":"Maitham A. Al-Dobais, F. Alrasheed, Ghazanfar Latif, Loay Alzubaidi","doi":"10.1109/ASAR.2018.8480378","DOIUrl":null,"url":null,"abstract":"In the digital age, developing an automated system to convert old printed books into digital form is a challenging task. In this paper we propose a novel technique for the recognition of Arabic scanned documents both with normal and complex layouts. The proposed algorithm is based on the local adaptive thresholding and geometric features which according to the author’s knowledge is the first time it is applied to Arabic document image recognition based on the Physical Layout Analysis (PLA). The proposed method was applied to dataset consisting of 90 images collected from 700 books from various publishers and contains a total of 1112 zones; text zone, image zone, and graphic zone. The proposed algorithm achieved promising results with overall average recognition of 86.71% for Text and Image block regions for all three sets. The proposed novel algorithm outperforms the techniques mentioned in previous literature.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Adoptive Thresholding and Geometric Features based Physical Layout Analysis of Scanned Arabic Books\",\"authors\":\"Maitham A. Al-Dobais, F. Alrasheed, Ghazanfar Latif, Loay Alzubaidi\",\"doi\":\"10.1109/ASAR.2018.8480378\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the digital age, developing an automated system to convert old printed books into digital form is a challenging task. In this paper we propose a novel technique for the recognition of Arabic scanned documents both with normal and complex layouts. The proposed algorithm is based on the local adaptive thresholding and geometric features which according to the author’s knowledge is the first time it is applied to Arabic document image recognition based on the Physical Layout Analysis (PLA). The proposed method was applied to dataset consisting of 90 images collected from 700 books from various publishers and contains a total of 1112 zones; text zone, image zone, and graphic zone. The proposed algorithm achieved promising results with overall average recognition of 86.71% for Text and Image block regions for all three sets. The proposed novel algorithm outperforms the techniques mentioned in previous literature.\",\"PeriodicalId\":165564,\"journal\":{\"name\":\"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASAR.2018.8480378\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAR.2018.8480378","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Adoptive Thresholding and Geometric Features based Physical Layout Analysis of Scanned Arabic Books
In the digital age, developing an automated system to convert old printed books into digital form is a challenging task. In this paper we propose a novel technique for the recognition of Arabic scanned documents both with normal and complex layouts. The proposed algorithm is based on the local adaptive thresholding and geometric features which according to the author’s knowledge is the first time it is applied to Arabic document image recognition based on the Physical Layout Analysis (PLA). The proposed method was applied to dataset consisting of 90 images collected from 700 books from various publishers and contains a total of 1112 zones; text zone, image zone, and graphic zone. The proposed algorithm achieved promising results with overall average recognition of 86.71% for Text and Image block regions for all three sets. The proposed novel algorithm outperforms the techniques mentioned in previous literature.