2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)最新文献

筛选
英文 中文
Discovering Qur’anic Knowledge through AQD: Arabic Qur’anic Database, a Multiple Resources Annotation-level Search 通过AQD发现古兰经知识:阿拉伯语古兰经数据库,多资源注释级检索
Sameer M. Alrehaili, E. Atwell
{"title":"Discovering Qur’anic Knowledge through AQD: Arabic Qur’anic Database, a Multiple Resources Annotation-level Search","authors":"Sameer M. Alrehaili, E. Atwell","doi":"10.1109/ASAR.2018.8480361","DOIUrl":"https://doi.org/10.1109/ASAR.2018.8480361","url":null,"abstract":"This paper introduces a novel resource for Arabic Qur’anic textual annotations: AQD, Arabic Qur’anic Database, providing an annotation-level search that draws on a number of available resources in a single query. In addition, it allows implementing a set of queries as rewrite rules, which is performed in a recursive way. The experiments show that our AQD is able to discover knowledge from very simple to very complex queries.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122583042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Hybrid Methods of Aligning Arabic Qur’anic Semantic Resources 阿拉伯语古兰经语义资源对齐的混合方法
Sameer M. Alrehaili, Mohammad M. Alqahtani, E. Atwell
{"title":"A Hybrid Methods of Aligning Arabic Qur’anic Semantic Resources","authors":"Sameer M. Alrehaili, Mohammad M. Alqahtani, E. Atwell","doi":"10.1109/ASAR.2018.8480309","DOIUrl":"https://doi.org/10.1109/ASAR.2018.8480309","url":null,"abstract":"Ontology alignment is a necessary step for enabling interoperability between ontology entities and for avoiding redundancy and variation that may occur when integrating them. The automation of bilingual ontology alignment is challenging due to the variation an entity can be expressed in, in different ontologies and languages. The goal of this paper is to compare various ontology alignment methods for matching ontological bilingual Qur’anic resources and to go beyond them, which is achieved via a new hybrid alignment method. The new method consists of aggregating multiple similarity measures for a given pair of concepts into a single value, taking advantage of combining fuzzy bilingual lexical and structure-based methods for improving the performance of automatic ontology alignment.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124171035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Information Extraction from Arabic and Latin scanned invoices 信息提取阿拉伯文和拉丁文扫描发票
Najoua Rahal, Maroua Tounsi, M. B. Jlaiel, A. Alimi
{"title":"Information Extraction from Arabic and Latin scanned invoices","authors":"Najoua Rahal, Maroua Tounsi, M. B. Jlaiel, A. Alimi","doi":"10.1109/ASAR.2018.8480221","DOIUrl":"https://doi.org/10.1109/ASAR.2018.8480221","url":null,"abstract":"The relevant entity extraction from scanned document image is a very challenging task due to highly heterogeneous templates, and several structure layouts. These problems lead to inaccuracy for document image recognized by OCR. In this paper, we propose an effective solution for these problems, in which the relevant entities are extracted from Arabic and Latin scanned invoices. The input of the system is an invoice image which is submitted to an OCR without layout analysis. After, invoices are labeled in the text recognized by the OCR. By combining the logical and physical structures, a local graph model is built for extraction entity. Finally, we implement a correction module which requires the mislabeling correction by eliminating the superfluous parts detected by labeling step. We evaluate the obtained results with 1050 real invoices as reported in experimental section.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127364178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Urdu Natural Scene Character Recognition using Convolutional Neural Networks 使用卷积神经网络的乌尔都语自然场景字符识别
Asghar Ali, M. Pickering, Kamran Shafi
{"title":"Urdu Natural Scene Character Recognition using Convolutional Neural Networks","authors":"Asghar Ali, M. Pickering, Kamran Shafi","doi":"10.1109/ASAR.2018.8480202","DOIUrl":"https://doi.org/10.1109/ASAR.2018.8480202","url":null,"abstract":"In this paper we investigate the challenging problem of cursive text recognition in natural scene images. In particular, we have focused on isolated Urdu character recognition in natural scenes that could not be handled by tradition Optical Character Recognition (OCR) techniques developed for Arabic and Urdu scanned documents. We also present a dataset of Urdu characters segmented from images of signboards, street scenes, shop scenes and advertisement banners containing Urdu text. A variety of deep learning techniques have been proposed by researchers for natural scene text detection and recognition. In this work, a Convolutional Neural Network (CNN) is applied as a classifier, as CNN approaches have been reported to provide high accuracy for natural scene text detection and recognition. A dataset of manually segmented characters was developed and deep learning based data augmentation techniques were applied to further increase the size of the dataset. The training is formulated using filter sizes of 3x3, 5x5 and mixed 3x3 and 5x5 with a stride value of 1 and 2. The CNN model is trained with various learning rates and state-of-the-art results are achieved.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125194704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
LABA: Logical Layout Analysis of Book Page Images in Arabic Using Multiple Support Vector Machines LABA:使用多支持向量机的阿拉伯文图书页面图像逻辑布局分析
Wenda Qin, Randa I. Elanwar, Margrit Betke
{"title":"LABA: Logical Layout Analysis of Book Page Images in Arabic Using Multiple Support Vector Machines","authors":"Wenda Qin, Randa I. Elanwar, Margrit Betke","doi":"10.1109/ASAR.2018.8480095","DOIUrl":"https://doi.org/10.1109/ASAR.2018.8480095","url":null,"abstract":"Logical layout analysis, which determines the function of a document region, for example, whether it is a title, paragraph, or caption, is an indispensable part in a document understanding system. Rule-based algorithms have long been used for such systems. The datasets available have been small, and so the generalization of the performance of these systems is difficult to assess. In this paper, we present LABA, a supervised machine learning system based on multiple support vector machines for conducting a logical Layout Analysis of scanned pages of Books in Arabic. Our system labels the function (class) of a document(scanned book pages) region, based on its position on the page and other features. We evaluated LABA with the benchmark \"BCE-Arabic-v1\" dataset, which contains scanned pages of illustrated Arabic books. We obtained high recall and precision values, and found that the F-measure of LABA is higher for all classes except the \"noise\" class compared to a neural network method that was based on prior work.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133954776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Towards the Machine Reading of Arabic Calligraphy: A Letters Dataset and Corresponding Corpus of Text 迈向阿拉伯书法的机器阅读:一个字母数据集和相应的文本语料库
Seetah ALSalamah, Ross D. King
{"title":"Towards the Machine Reading of Arabic Calligraphy: A Letters Dataset and Corresponding Corpus of Text","authors":"Seetah ALSalamah, Ross D. King","doi":"10.1109/ASAR.2018.8480228","DOIUrl":"https://doi.org/10.1109/ASAR.2018.8480228","url":null,"abstract":"Arabic calligraphy is one of the great art forms of the world. It displays Arabic phrases, commonly taken from the Holy Quran, in beautiful two-dimensional form. The use of two dimensions, and the interweaving of letters and words makes reading a far greater challenge for Artificial Intelligence (AI) than reading standard printed or hand-written Arabic. To approach this challenge, we have constructed a dataset of Arabic calligraphic letters, along with a corresponding corpus of phrases and quotes. The letters dataset contains a total of 3,467 images for 32 various categories of Arabic calligraphic-type letters. The associated text corpus contains 544 unique quoted phrases. These data were collected from various open sources on the web, and include examples from several Arabic calligraphic styles. We have also undertaken both an explorative statistical analysis of this data, and initial machine learning investigations. These analyses suggest that combining knowledge of a limited variety of Arabic calligraphy texts, with a successful machine will be sufficient for the machine reading of forms of Arabic calligraphy.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132249682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
ASAR 2018 Layout Analysis Challenge: Using Random Forests to Analyze Scanned Arabic Books ASAR 2018布局分析挑战:使用随机森林分析扫描的阿拉伯语书籍
Rana S. M. Saad, Randa I. Elanwar, N. A. Kader, S. Mashali, Margrit Betke
{"title":"ASAR 2018 Layout Analysis Challenge: Using Random Forests to Analyze Scanned Arabic Books","authors":"Rana S. M. Saad, Randa I. Elanwar, N. A. Kader, S. Mashali, Margrit Betke","doi":"10.1109/ASAR.2018.8480330","DOIUrl":"https://doi.org/10.1109/ASAR.2018.8480330","url":null,"abstract":"Physical Layout Analysis (PLA) is a necessary step to recognize the contents of a digital document. PLA includes segmenting the document image and identifying the content type of the segments. PLA for digitized Arabic documents is challenging due to the nature of the Arabic script. In this paper, we introduce a PLA system for Arabic documents that were digitized by scanning. Our system RFAAD, short for \"Random Forests for Analyzing Arabic Documents,\" starts with morphological preprocessing of the digitized hard copy and then extracts geometrical, shape, and context features to identify the connected components (CC) of the digital image as containing text or non-text. Random forests are trained using the first dataset release of a large data collection project, BCE-Arabic-v1 [22]. Our system shows strong performance on BCE data in terms of CC classification accuracy and F1-score (97.5% and 97.7% respectively). When evaluated on datasets by other researchers [2], [11], RFAAD also performs well. Moreover, RFAAD shows moderately strong performance when applied to the most challenging layouts of the benchmarking dataset of the ASAR 2018 competition PLA-SAB.1 The performance of RFAAD suggests that our work, with some modifications, has the potential to solve other open problems in the document analysis area and attain a relatively high degree of generalization.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115899345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The ASAR 2018 Competition on Physical Layout Analysis of Scanned Arabic Books (PLA-SAB 2018) ASAR 2018阿拉伯扫描图书物理布局分析竞赛(PLA-SAB 2018)
Randa I. Elanwar, Margrit Betke
{"title":"The ASAR 2018 Competition on Physical Layout Analysis of Scanned Arabic Books (PLA-SAB 2018)","authors":"Randa I. Elanwar, Margrit Betke","doi":"10.1109/ASAR.2018.8480194","DOIUrl":"https://doi.org/10.1109/ASAR.2018.8480194","url":null,"abstract":"Successful physical layout analysis (PLA) is a key factor in the performance of text recognizers and many other applications. PLA solutions for scanned Arabic documents are few and difficult to compare due to differences in methods, data, and evaluation metrics. To help evaluate the performance of recent Arabic PLA solutions, the ASAR 2018 Competition on Physical Layout Analysis (PLA) was organized. This paper presents the results of this competition. The competition focused on analyzing layouts for Arabic scanned book pages (SAB). PLA-SAB required solutions of two tasks: page-to-block segmentation and block text/non-text classification. In this paper we briefly describe the methods provided by participating teams, present their results for both tasks using the BCE-Arabic benchmarking dataset [1], and make an open call for continuous participation outside the context of ASAR 2018.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130937806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A hybrid approach for standardized Dictionary-based knowledge extraction for Arabic morpho-semantic retrieval 一种基于标准词典的阿拉伯语形态语义检索知识提取的混合方法
Nadia Soudani, Ibrahim Bounhas, Y. Slimani
{"title":"A hybrid approach for standardized Dictionary-based knowledge extraction for Arabic morpho-semantic retrieval","authors":"Nadia Soudani, Ibrahim Bounhas, Y. Slimani","doi":"10.1109/ASAR.2018.8480178","DOIUrl":"https://doi.org/10.1109/ASAR.2018.8480178","url":null,"abstract":"We propose in this paper to exploit Arabic dictionaries to enhance Arabic Information Retrieval (IR). We use standardized LMF dictionaries. We first put forward to mine such dictionaries and to represent them into graph-based representation. This graph will also be mined with a hybrid approach that combines both linguistic and statistical techniques to extract useful knowledge for IR. We study how extracted knowledge from such resource and added to the initial queries can attentively affect the retrieval process and results. Several query expansion strategies are carried based on morphological, semantic and morpho-semantic queries terms relations.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125308542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Arabic words Recognition using CNN and TNN on a Smartphone 在智能手机上使用CNN和TNN进行阿拉伯语单词识别
Alaa Alsaeedi, Hanan Al Mutawa, S. Snoussi, Sumayah Natheer, Kaouther Omri, Wisam Al Subhi
{"title":"Arabic words Recognition using CNN and TNN on a Smartphone","authors":"Alaa Alsaeedi, Hanan Al Mutawa, S. Snoussi, Sumayah Natheer, Kaouther Omri, Wisam Al Subhi","doi":"10.1109/ASAR.2018.8480267","DOIUrl":"https://doi.org/10.1109/ASAR.2018.8480267","url":null,"abstract":"Arabic script recognition has been a challenging due to the variability of writing styles, to the nature of Arabic scripts, to the complexities of processing steps and to the varieties of recognition methods. This paper uses a Convolutional Neural Network (CNN) for character recognition and Transparent Neural Network (TNN) for words reading. Because Arabic character segmentation is a very complicated step, we recognize only the first, the last character of all connected components of the recognized word and the isolated ones. A combination between the CNN and the TNN will complete the recognition of the whole word. CNN is a multi-layer feed-forward neural network that extracts features and properties from the input data. TNN is a special NN that recognize words from already activated characters and part of words. These methods are already used on computer recognition system. The proposed work is to integrate these methods and adapt them to the android operating system to apply them on smartphone. The evaluation is done on a database of Signboards Images of printed town names and the recognition rate is 98%.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134310520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信