基于自适应阈值法和几何特征的阿拉伯文扫描图书物理布局分析

Maitham A. Al-Dobais, F. Alrasheed, Ghazanfar Latif, Loay Alzubaidi
{"title":"基于自适应阈值法和几何特征的阿拉伯文扫描图书物理布局分析","authors":"Maitham A. Al-Dobais, F. Alrasheed, Ghazanfar Latif, Loay Alzubaidi","doi":"10.1109/ASAR.2018.8480378","DOIUrl":null,"url":null,"abstract":"In the digital age, developing an automated system to convert old printed books into digital form is a challenging task. In this paper we propose a novel technique for the recognition of Arabic scanned documents both with normal and complex layouts. The proposed algorithm is based on the local adaptive thresholding and geometric features which according to the author’s knowledge is the first time it is applied to Arabic document image recognition based on the Physical Layout Analysis (PLA). The proposed method was applied to dataset consisting of 90 images collected from 700 books from various publishers and contains a total of 1112 zones; text zone, image zone, and graphic zone. The proposed algorithm achieved promising results with overall average recognition of 86.71% for Text and Image block regions for all three sets. The proposed novel algorithm outperforms the techniques mentioned in previous literature.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Adoptive Thresholding and Geometric Features based Physical Layout Analysis of Scanned Arabic Books\",\"authors\":\"Maitham A. Al-Dobais, F. Alrasheed, Ghazanfar Latif, Loay Alzubaidi\",\"doi\":\"10.1109/ASAR.2018.8480378\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the digital age, developing an automated system to convert old printed books into digital form is a challenging task. In this paper we propose a novel technique for the recognition of Arabic scanned documents both with normal and complex layouts. The proposed algorithm is based on the local adaptive thresholding and geometric features which according to the author’s knowledge is the first time it is applied to Arabic document image recognition based on the Physical Layout Analysis (PLA). The proposed method was applied to dataset consisting of 90 images collected from 700 books from various publishers and contains a total of 1112 zones; text zone, image zone, and graphic zone. The proposed algorithm achieved promising results with overall average recognition of 86.71% for Text and Image block regions for all three sets. The proposed novel algorithm outperforms the techniques mentioned in previous literature.\",\"PeriodicalId\":165564,\"journal\":{\"name\":\"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASAR.2018.8480378\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAR.2018.8480378","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

在数字时代,开发一种将旧印刷书籍转换为数字形式的自动化系统是一项具有挑战性的任务。在本文中,我们提出了一种新的技术来识别阿拉伯文扫描文档的正常和复杂的布局。该算法基于局部自适应阈值和几何特征,首次应用于基于物理布局分析(Physical Layout Analysis, PLA)的阿拉伯文本图像识别。将该方法应用于包含90幅图像的数据集,这些图像来自不同出版商的700本书,共包含1112个区域;文本区、图像区和图形区。该算法在三组文本和图像块区域的总体平均识别率为86.71%,取得了令人满意的结果。提出的新算法优于先前文献中提到的技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Adoptive Thresholding and Geometric Features based Physical Layout Analysis of Scanned Arabic Books
In the digital age, developing an automated system to convert old printed books into digital form is a challenging task. In this paper we propose a novel technique for the recognition of Arabic scanned documents both with normal and complex layouts. The proposed algorithm is based on the local adaptive thresholding and geometric features which according to the author’s knowledge is the first time it is applied to Arabic document image recognition based on the Physical Layout Analysis (PLA). The proposed method was applied to dataset consisting of 90 images collected from 700 books from various publishers and contains a total of 1112 zones; text zone, image zone, and graphic zone. The proposed algorithm achieved promising results with overall average recognition of 86.71% for Text and Image block regions for all three sets. The proposed novel algorithm outperforms the techniques mentioned in previous literature.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信