2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)最新文献

Discovering Qur’anic Knowledge through AQD: Arabic Qur’anic Database, a Multiple Resources Annotation-level Search 通过AQD发现古兰经知识:阿拉伯语古兰经数据库，多资源注释级检索

2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR) Pub Date : 2018-03-14 DOI: 10.1109/ASAR.2018.8480361

Sameer M. Alrehaili, E. Atwell

引用次数: 1

A Hybrid Methods of Aligning Arabic Qur’anic Semantic Resources 阿拉伯语古兰经语义资源对齐的混合方法

2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR) Pub Date : 2018-03-14 DOI: 10.1109/ASAR.2018.8480309

Sameer M. Alrehaili, Mohammad M. Alqahtani, E. Atwell

引用次数: 2

Information Extraction from Arabic and Latin scanned invoices 信息提取阿拉伯文和拉丁文扫描发票

2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR) Pub Date : 2018-03-12 DOI: 10.1109/ASAR.2018.8480221

Najoua Rahal, Maroua Tounsi, M. B. Jlaiel, A. Alimi

引用次数: 7

Urdu Natural Scene Character Recognition using Convolutional Neural Networks 使用卷积神经网络的乌尔都语自然场景字符识别

2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR) Pub Date : 2018-03-12 DOI: 10.1109/ASAR.2018.8480202

Asghar Ali, M. Pickering, Kamran Shafi

{"title":"Urdu Natural Scene Character Recognition using Convolutional Neural Networks","authors":"Asghar Ali, M. Pickering, Kamran Shafi","doi":"10.1109/ASAR.2018.8480202","DOIUrl":"https://doi.org/10.1109/ASAR.2018.8480202","url":null,"abstract":"In this paper we investigate the challenging problem of cursive text recognition in natural scene images. In particular, we have focused on isolated Urdu character recognition in natural scenes that could not be handled by tradition Optical Character Recognition (OCR) techniques developed for Arabic and Urdu scanned documents. We also present a dataset of Urdu characters segmented from images of signboards, street scenes, shop scenes and advertisement banners containing Urdu text. A variety of deep learning techniques have been proposed by researchers for natural scene text detection and recognition. In this work, a Convolutional Neural Network (CNN) is applied as a classifier, as CNN approaches have been reported to provide high accuracy for natural scene text detection and recognition. A dataset of manually segmented characters was developed and deep learning based data augmentation techniques were applied to further increase the size of the dataset. The training is formulated using filter sizes of 3x3, 5x5 and mixed 3x3 and 5x5 with a stride value of 1 and 2. The CNN model is trained with various learning rates and state-of-the-art results are achieved.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125194704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

LABA: Logical Layout Analysis of Book Page Images in Arabic Using Multiple Support Vector Machines LABA:使用多支持向量机的阿拉伯文图书页面图像逻辑布局分析

2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR) Pub Date : 2018-03-12 DOI: 10.1109/ASAR.2018.8480095

Wenda Qin, Randa I. Elanwar, Margrit Betke

引用次数: 5

Towards the Machine Reading of Arabic Calligraphy: A Letters Dataset and Corresponding Corpus of Text 迈向阿拉伯书法的机器阅读:一个字母数据集和相应的文本语料库

2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR) Pub Date : 2018-03-12 DOI: 10.1109/ASAR.2018.8480228

Seetah ALSalamah, Ross D. King

{"title":"Towards the Machine Reading of Arabic Calligraphy: A Letters Dataset and Corresponding Corpus of Text","authors":"Seetah ALSalamah, Ross D. King","doi":"10.1109/ASAR.2018.8480228","DOIUrl":"https://doi.org/10.1109/ASAR.2018.8480228","url":null,"abstract":"Arabic calligraphy is one of the great art forms of the world. It displays Arabic phrases, commonly taken from the Holy Quran, in beautiful two-dimensional form. The use of two dimensions, and the interweaving of letters and words makes reading a far greater challenge for Artificial Intelligence (AI) than reading standard printed or hand-written Arabic. To approach this challenge, we have constructed a dataset of Arabic calligraphic letters, along with a corresponding corpus of phrases and quotes. The letters dataset contains a total of 3,467 images for 32 various categories of Arabic calligraphic-type letters. The associated text corpus contains 544 unique quoted phrases. These data were collected from various open sources on the web, and include examples from several Arabic calligraphic styles. We have also undertaken both an explorative statistical analysis of this data, and initial machine learning investigations. These analyses suggest that combining knowledge of a limited variety of Arabic calligraphy texts, with a successful machine will be sufficient for the machine reading of forms of Arabic calligraphy.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132249682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

ASAR 2018 Layout Analysis Challenge: Using Random Forests to Analyze Scanned Arabic Books ASAR 2018布局分析挑战:使用随机森林分析扫描的阿拉伯语书籍

2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR) Pub Date : 2018-03-12 DOI: 10.1109/ASAR.2018.8480330

Rana S. M. Saad, Randa I. Elanwar, N. A. Kader, S. Mashali, Margrit Betke

{"title":"ASAR 2018 Layout Analysis Challenge: Using Random Forests to Analyze Scanned Arabic Books","authors":"Rana S. M. Saad, Randa I. Elanwar, N. A. Kader, S. Mashali, Margrit Betke","doi":"10.1109/ASAR.2018.8480330","DOIUrl":"https://doi.org/10.1109/ASAR.2018.8480330","url":null,"abstract":"Physical Layout Analysis (PLA) is a necessary step to recognize the contents of a digital document. PLA includes segmenting the document image and identifying the content type of the segments. PLA for digitized Arabic documents is challenging due to the nature of the Arabic script. In this paper, we introduce a PLA system for Arabic documents that were digitized by scanning. Our system RFAAD, short for \"Random Forests for Analyzing Arabic Documents,\" starts with morphological preprocessing of the digitized hard copy and then extracts geometrical, shape, and context features to identify the connected components (CC) of the digital image as containing text or non-text. Random forests are trained using the first dataset release of a large data collection project, BCE-Arabic-v1 [22]. Our system shows strong performance on BCE data in terms of CC classification accuracy and F1-score (97.5% and 97.7% respectively). When evaluated on datasets by other researchers [2], [11], RFAAD also performs well. Moreover, RFAAD shows moderately strong performance when applied to the most challenging layouts of the benchmarking dataset of the ASAR 2018 competition PLA-SAB.1 The performance of RFAAD suggests that our work, with some modifications, has the potential to solve other open problems in the document analysis area and attain a relatively high degree of generalization.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115899345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

The ASAR 2018 Competition on Physical Layout Analysis of Scanned Arabic Books (PLA-SAB 2018) ASAR 2018阿拉伯扫描图书物理布局分析竞赛(PLA-SAB 2018)

2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR) Pub Date : 2018-03-12 DOI: 10.1109/ASAR.2018.8480194

Randa I. Elanwar, Margrit Betke

引用次数: 3

A hybrid approach for standardized Dictionary-based knowledge extraction for Arabic morpho-semantic retrieval 一种基于标准词典的阿拉伯语形态语义检索知识提取的混合方法

2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR) Pub Date : 2018-03-12 DOI: 10.1109/ASAR.2018.8480178

Nadia Soudani, Ibrahim Bounhas, Y. Slimani

引用次数: 1

Arabic words Recognition using CNN and TNN on a Smartphone 在智能手机上使用CNN和TNN进行阿拉伯语单词识别

2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR) Pub Date : 2018-03-12 DOI: 10.1109/ASAR.2018.8480267

Alaa Alsaeedi, Hanan Al Mutawa, S. Snoussi, Sumayah Natheer, Kaouther Omri, Wisam Al Subhi

{"title":"Arabic words Recognition using CNN and TNN on a Smartphone","authors":"Alaa Alsaeedi, Hanan Al Mutawa, S. Snoussi, Sumayah Natheer, Kaouther Omri, Wisam Al Subhi","doi":"10.1109/ASAR.2018.8480267","DOIUrl":"https://doi.org/10.1109/ASAR.2018.8480267","url":null,"abstract":"Arabic script recognition has been a challenging due to the variability of writing styles, to the nature of Arabic scripts, to the complexities of processing steps and to the varieties of recognition methods. This paper uses a Convolutional Neural Network (CNN) for character recognition and Transparent Neural Network (TNN) for words reading. Because Arabic character segmentation is a very complicated step, we recognize only the first, the last character of all connected components of the recognized word and the isolated ones. A combination between the CNN and the TNN will complete the recognition of the whole word. CNN is a multi-layer feed-forward neural network that extracts features and properties from the input data. TNN is a special NN that recognize words from already activated characters and part of words. These methods are already used on computer recognition system. The proposed work is to integrate these methods and adapt them to the android operating system to apply them on smartphone. The evaluation is done on a database of Signboards Images of printed town names and the recognition rate is 98%.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134310520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5