使用掩码 RCNN 进行阿拉伯语历史文献布局分析

Latifa Aljiffry , Hassanin Al-Barhamtoshy , Felwa Abukhodair , Amani Jamal
{"title":"使用掩码 RCNN 进行阿拉伯语历史文献布局分析","authors":"Latifa Aljiffry ,&nbsp;Hassanin Al-Barhamtoshy ,&nbsp;Felwa Abukhodair ,&nbsp;Amani Jamal","doi":"10.1016/j.procs.2024.10.220","DOIUrl":null,"url":null,"abstract":"<div><div>In recent times, there has been a notable surge in the interest of researchers in the realm of document analysis and optical character recognition (OCR). Significant advancements have been made in OCR engines across various languages, encompassing both printed and handwritten documents. However, there has been a comparatively lower focus on processing documents written in Arabic when juxtaposed with languages like English. This discrepancy arises from several factors, including the inherent challenges posed by the Arabic language and the limited availability of Arabic document datasets. To implement any OCR engine, the initial step involves analyzing the layout of images before subjecting them to the OCR process. This thesis specifically delves into the realm of layout analysis for historical Arabic documents, employing a deep learning (DL) approach. The chosen methodology utilizes the Mask Region-based Convolutional Neural Network (RCNN). The dataset employed consists of historical Arabic documents, particularly early printed ones, each characterized by unique sizes, structures, and processing prerequisites. Processing historical documents is inherently more challenging due to factors such as the document's layout structure, distinctive handwriting styles of the authors, paper aging, historical timeframe, ink properties, and more. The achieved accuracy result is 51.14%. When juxtaposed with other existing models, it becomes evident that this work attains a state-of-the-art status, showcasing an impressive outcome.</div></div>","PeriodicalId":20465,"journal":{"name":"Procedia Computer Science","volume":"244 ","pages":"Pages 453-460"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Arabic Historical Documents Layout Analysis using Mask RCNN\",\"authors\":\"Latifa Aljiffry ,&nbsp;Hassanin Al-Barhamtoshy ,&nbsp;Felwa Abukhodair ,&nbsp;Amani Jamal\",\"doi\":\"10.1016/j.procs.2024.10.220\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In recent times, there has been a notable surge in the interest of researchers in the realm of document analysis and optical character recognition (OCR). Significant advancements have been made in OCR engines across various languages, encompassing both printed and handwritten documents. However, there has been a comparatively lower focus on processing documents written in Arabic when juxtaposed with languages like English. This discrepancy arises from several factors, including the inherent challenges posed by the Arabic language and the limited availability of Arabic document datasets. To implement any OCR engine, the initial step involves analyzing the layout of images before subjecting them to the OCR process. This thesis specifically delves into the realm of layout analysis for historical Arabic documents, employing a deep learning (DL) approach. The chosen methodology utilizes the Mask Region-based Convolutional Neural Network (RCNN). The dataset employed consists of historical Arabic documents, particularly early printed ones, each characterized by unique sizes, structures, and processing prerequisites. Processing historical documents is inherently more challenging due to factors such as the document's layout structure, distinctive handwriting styles of the authors, paper aging, historical timeframe, ink properties, and more. The achieved accuracy result is 51.14%. When juxtaposed with other existing models, it becomes evident that this work attains a state-of-the-art status, showcasing an impressive outcome.</div></div>\",\"PeriodicalId\":20465,\"journal\":{\"name\":\"Procedia Computer Science\",\"volume\":\"244 \",\"pages\":\"Pages 453-460\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Procedia Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1877050924030217\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Procedia Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1877050924030217","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近来,研究人员对文档分析和光学字符识别(OCR)领域的兴趣明显增加。各种语言的光学字符识别引擎都取得了长足的进步,包括印刷文件和手写文件。然而,与英语等语言相比,人们对阿拉伯语文档处理的关注度相对较低。造成这种差异的因素有很多,包括阿拉伯语本身带来的挑战和阿拉伯语文档数据集的有限性。要实现任何 OCR 引擎,第一步都要先分析图像的布局,然后再对其进行 OCR 处理。本论文采用深度学习(DL)方法,专门研究阿拉伯语历史文献的布局分析。所选方法利用了基于掩码区域的卷积神经网络(RCNN)。所使用的数据集由阿拉伯语历史文献组成,尤其是早期印刷文献,每种文献都有独特的尺寸、结构和处理前提。由于文件的版面结构、作者独特的手写风格、纸张老化、历史时限、油墨属性等因素,处理历史文件本身就更具挑战性。所达到的准确率为 51.14%。与其他现有模型相比,这项工作显然达到了最先进的水平,展示了令人印象深刻的成果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Arabic Historical Documents Layout Analysis using Mask RCNN
In recent times, there has been a notable surge in the interest of researchers in the realm of document analysis and optical character recognition (OCR). Significant advancements have been made in OCR engines across various languages, encompassing both printed and handwritten documents. However, there has been a comparatively lower focus on processing documents written in Arabic when juxtaposed with languages like English. This discrepancy arises from several factors, including the inherent challenges posed by the Arabic language and the limited availability of Arabic document datasets. To implement any OCR engine, the initial step involves analyzing the layout of images before subjecting them to the OCR process. This thesis specifically delves into the realm of layout analysis for historical Arabic documents, employing a deep learning (DL) approach. The chosen methodology utilizes the Mask Region-based Convolutional Neural Network (RCNN). The dataset employed consists of historical Arabic documents, particularly early printed ones, each characterized by unique sizes, structures, and processing prerequisites. Processing historical documents is inherently more challenging due to factors such as the document's layout structure, distinctive handwriting styles of the authors, paper aging, historical timeframe, ink properties, and more. The achieved accuracy result is 51.14%. When juxtaposed with other existing models, it becomes evident that this work attains a state-of-the-art status, showcasing an impressive outcome.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
4.50
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信