Towards the Machine Reading of Arabic Calligraphy: A Letters Dataset and Corresponding Corpus of Text

Seetah ALSalamah, Ross D. King
{"title":"Towards the Machine Reading of Arabic Calligraphy: A Letters Dataset and Corresponding Corpus of Text","authors":"Seetah ALSalamah, Ross D. King","doi":"10.1109/ASAR.2018.8480228","DOIUrl":null,"url":null,"abstract":"Arabic calligraphy is one of the great art forms of the world. It displays Arabic phrases, commonly taken from the Holy Quran, in beautiful two-dimensional form. The use of two dimensions, and the interweaving of letters and words makes reading a far greater challenge for Artificial Intelligence (AI) than reading standard printed or hand-written Arabic. To approach this challenge, we have constructed a dataset of Arabic calligraphic letters, along with a corresponding corpus of phrases and quotes. The letters dataset contains a total of 3,467 images for 32 various categories of Arabic calligraphic-type letters. The associated text corpus contains 544 unique quoted phrases. These data were collected from various open sources on the web, and include examples from several Arabic calligraphic styles. We have also undertaken both an explorative statistical analysis of this data, and initial machine learning investigations. These analyses suggest that combining knowledge of a limited variety of Arabic calligraphy texts, with a successful machine will be sufficient for the machine reading of forms of Arabic calligraphy.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAR.2018.8480228","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Arabic calligraphy is one of the great art forms of the world. It displays Arabic phrases, commonly taken from the Holy Quran, in beautiful two-dimensional form. The use of two dimensions, and the interweaving of letters and words makes reading a far greater challenge for Artificial Intelligence (AI) than reading standard printed or hand-written Arabic. To approach this challenge, we have constructed a dataset of Arabic calligraphic letters, along with a corresponding corpus of phrases and quotes. The letters dataset contains a total of 3,467 images for 32 various categories of Arabic calligraphic-type letters. The associated text corpus contains 544 unique quoted phrases. These data were collected from various open sources on the web, and include examples from several Arabic calligraphic styles. We have also undertaken both an explorative statistical analysis of this data, and initial machine learning investigations. These analyses suggest that combining knowledge of a limited variety of Arabic calligraphy texts, with a successful machine will be sufficient for the machine reading of forms of Arabic calligraphy.
迈向阿拉伯书法的机器阅读:一个字母数据集和相应的文本语料库
阿拉伯书法是世界上最伟大的艺术形式之一。它以美丽的二维形式显示阿拉伯语短语,这些短语通常取自《可兰经》。对于人工智能(AI)来说,二维的使用以及字母和单词的交织使得阅读比阅读标准印刷或手写的阿拉伯语要困难得多。为了应对这一挑战,我们构建了一个阿拉伯书法字母的数据集,以及相应的短语和引语语料库。字母数据集包含32个不同类别的阿拉伯书法类型字母的总共3,467张图像。相关的文本语料库包含544个唯一的引用短语。这些数据是从网络上的各种开放资源中收集的,包括几种阿拉伯书法风格的例子。我们还对这些数据进行了探索性的统计分析,并进行了初步的机器学习调查。这些分析表明,结合有限种类的阿拉伯书法文本的知识,与一个成功的机器将足以为机器阅读形式的阿拉伯书法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信