Layout Analysis on Challenging Historical Arabic Manuscripts using Siamese Network

2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI:10.1109/ICDAR.2019.00123

Reem Alaasam, Berat Kurar, Jihad El-Sana

引用次数: 10

Abstract

This paper presents layout analysis for historical Arabic documents using siamese network. Given pages from different documents, we divide them into patches of similar sizes. We train a siamese network model that takes as an input a pair of patches and gives as an output a distance that corresponds to the similarity between the two patches. We used the trained model to calculate a distance matrix which in turn is used to cluster the patches of a page as either main text, side text or a background patch. We evaluate our method on challenging historical Arabic manuscripts dataset and report the F-measure. We show the effectiveness of our method by comparing with other works that use deep learning approaches, and show that we have state of art results.

查看原文本刊更多论文

利用Siamese网络分析阿拉伯历史手稿的布局

本文介绍了利用暹罗网络对阿拉伯历史文献进行版面分析的方法。给定来自不同文档的页面，我们将它们划分为大小相似的补丁。我们训练了一个暹罗网络模型，该模型将一对补丁作为输入，并给出对应于两个补丁之间相似度的距离作为输出。我们使用训练好的模型来计算距离矩阵，该矩阵反过来用于将页面的补丁聚类为主文本，侧文本或背景补丁。我们在具有挑战性的历史阿拉伯手稿数据集上评估了我们的方法，并报告了f值。通过与其他使用深度学习方法的作品进行比较，我们展示了我们方法的有效性，并展示了我们拥有最先进的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 International Conference on Document Analysis and Recognition (ICDAR)

自引率

0.00%

发文量