2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)最新文献

筛选
英文 中文
Space Anomalies in OCRs for Arabic Like Scripts 类阿拉伯文字ocr中的空间异常
Riaz Ahmad, Muhammad Zeshan Afzal, Sheikh Faisal Rashid, M. Liwicki, A. Dengel
{"title":"Space Anomalies in OCRs for Arabic Like Scripts","authors":"Riaz Ahmad, Muhammad Zeshan Afzal, Sheikh Faisal Rashid, M. Liwicki, A. Dengel","doi":"10.1109/ASAR.2018.8480229","DOIUrl":"https://doi.org/10.1109/ASAR.2018.8480229","url":null,"abstract":"This paper investigates and analyses the nature of errors occurring in Optical Character Recognition (OCR) for Arabic-like scripts. Existing research on the area of OCR for Arabic-like scripts often focuses on achieving the best performance in terms of character error rates. Only little effort targets at the analysis of the nature of these errors (anomalies) that may occur. One such important anomaly is Space Anomaly. This anomaly is due to the presence of breaker characters that are an essential part of Arabic-like scripts. The spaces introduced by breaker characters are not depicted in the ground truth making it hard for OCR to generalize. The OCR model either learns to inhibit the original spaces or to generate extra spaces at places where they are not correct. Due to this confusion, the rendering looks sub-optimal. This analyses and removes space anomalies. We present a joint approach that does not only perform OCR but also handles the space anomalies in a robust manner, hence significantly outperforming the state-of-the-art. Although the implication of the work is shown by improved character recognition rate, the impact of this research is much higher in terms of the correctness of the OCR for useful purposes, especially for rendering. The claim is supported by empirical evaluation and it is shown that the proposed approach achieved the best results.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129093846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Developing Bilingual Arabic-English Ontologies of Al-Quran 发展《古兰经》阿拉伯-英语双语本体
Mohammad M. Alqahtani, E. Atwell
{"title":"Developing Bilingual Arabic-English Ontologies of Al-Quran","authors":"Mohammad M. Alqahtani, E. Atwell","doi":"10.1109/ASAR.2018.8480237","DOIUrl":"https://doi.org/10.1109/ASAR.2018.8480237","url":null,"abstract":"The main aim of developing a Quranic ontology is to facilitate the retrieval of knowledge from Al-Quran. Additionally, Quranic ontologies will enrich the raw Arabic and English Quran text with Islamic semantic tags. However, current Quran ontologies have different: scopes, formats, and entity names for the same concepts. Additionally, a single Quranic ontology does not cover most of the knowledge in Al-Quran. Therefore, these ontologies need to be increased, normalised, aligned and combined with other Quran resources such as Quran chapter and verse names, Quran word meanings, and other Quranic datasets. This paper reviews current Quran ontologies and datasets. Then, it presents several stages for developing Arabic-English Quran ontologies from different datasets related to Al Quran.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130556594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Diacritization of a Highly Cited Text: A Classical Arabic Book as a Case 高引用文本的变音符化:以一本古典阿拉伯书为例
A. Alosaimy, E. Atwell
{"title":"Diacritization of a Highly Cited Text: A Classical Arabic Book as a Case","authors":"A. Alosaimy, E. Atwell","doi":"10.1109/ASAR.2018.8480176","DOIUrl":"https://doi.org/10.1109/ASAR.2018.8480176","url":null,"abstract":"We present a robust and accurate diacritization method of highly cited texts by automatically \"borrowing\" diacritization from similar contexts. This method of diacritization has been tested on diacritizing one book: \"Riyad As-Salheen\", for the purpose of morphological annotation of the Sunnah Arabic Corpus. The original source of Riyad is about 48.66% diacritized, and after borrowing diacritization, the percentage jumps to 76.41% with low diacritic error rate (0.004), compared to 61.73% (DER=0.214) using MADAMIRA toolkit, and 67.68% (DER=0.006) using Farasa toolkit. More importantly, this method has reduced the word ambiguity from 4.83 diacritized form/word to 1.91.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132936785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Arabic Bank Cheque Words Recognition Using Gabor Features 使用Gabor特征的阿拉伯银行支票词识别
Q. Al-Nuzaili, S. Al-Maadeed, Hanadi Hassen, Ali Hamdi
{"title":"Arabic Bank Cheque Words Recognition Using Gabor Features","authors":"Q. Al-Nuzaili, S. Al-Maadeed, Hanadi Hassen, Ali Hamdi","doi":"10.1109/ASAR.2018.8480197","DOIUrl":"https://doi.org/10.1109/ASAR.2018.8480197","url":null,"abstract":"Arabic cheque processing is one of the important applications of handwriting recognition. The recognition of Arabic Cheque bank is still awaiting lots of work in its constituent stages, which include pre-processing, feature extraction and classification. Several feature extraction methods used to recognize handwritten digits and words. The stroke direction is one important feature of Arabic handwriting which Gabor filter proved its ability to detect this local structural feature. On the other hand, investigating different classifiers can improve the recognition accuracy. In this paper, Gabor features are investigated with ELM and SMO classifiers. Two Arabic Cheque datasets, AHDB and CENPARMI, are used for evaluation. The results from Gabor features with SMO classifier outperform previous studies.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116595919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Reliable Method to Predict Parkinson’s Disease Stage and Progression based on Handwriting and Re-sampling Approaches 基于手写和重采样方法预测帕金森病分期和进展的可靠方法
C. Taleb, M. Khachab, C. Mokbel, Laurence Likforman-Sulem
{"title":"A Reliable Method to Predict Parkinson’s Disease Stage and Progression based on Handwriting and Re-sampling Approaches","authors":"C. Taleb, M. Khachab, C. Mokbel, Laurence Likforman-Sulem","doi":"10.1109/ASAR.2018.8480209","DOIUrl":"https://doi.org/10.1109/ASAR.2018.8480209","url":null,"abstract":"A reliable system depending on algorithms that assist in the decision-making process to diagnose Parkinson’s disease (PD) at an early stage and to predict the Hoehn & Yahr (H&Y) stage and the unified Parkinson’s disease rating scale (UPDRS) score is developed. In a previous work [3], we used features extracted from Arabic handwriting for diagnosing PD as binary decision. In this work, we use these features for constructing a prediction model that evaluates the H&Y stage and the UPDRS scores. A multi-class support vector machine (SVM) classifier is trained using re-sampling approaches such as adaptive synthetic sampling approach (ADASYN). The classifier is evaluated with 4-fold cross validation. The experiments show that H&Y stage, UPDRS scores, and total UPDRS can be predicted with accuracies of 94%, 92%, and 88% respectively. The proposed method can be implemented as an efficient clinical decision support system for early detection and monitoring the progression of PD.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131172972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Deep Convolutional Neural Network for Recognition of Unified Multi-Language Handwritten Numerals 基于深度卷积神经网络的统一多语言手写数字识别
Ghazanfar Latif, J. Alghazo, Loay Alzubaidi, M. Naseer, Yazan Alghazo
{"title":"Deep Convolutional Neural Network for Recognition of Unified Multi-Language Handwritten Numerals","authors":"Ghazanfar Latif, J. Alghazo, Loay Alzubaidi, M. Naseer, Yazan Alghazo","doi":"10.1109/ASAR.2018.8480289","DOIUrl":"https://doi.org/10.1109/ASAR.2018.8480289","url":null,"abstract":"Deep learning systems have recently gained importance as the architecture of choice in artificial intelligence (AI). Handwritten numeral recognition is essential for the development of systems that can accurately recognize digits in different languages which is a challenging task due to variant writing styles. This is still an open area of research for developing an optimized Multilanguage writer independent technique for numerals. In this paper, we propose a deep learning architecture for the recognition of handwritten Multilanguage (mixed numerals belongs to multiple languages) numerals (Eastern Arabic, Persian, Devanagari, Urdu, Western Arabic). The overall accuracy of the combined Multilanguage database was 99.26% with a precision of 99.29% on average. The average accuracy of each individual language was found to be 99.322%. Results indicate that the proposed deep learning architecture produces better results compared to methods suggested in the previous literature.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123609001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Synthesizing versus Augmentation for Arabic Word Recognition with Convolutional Neural Networks 卷积神经网络在阿拉伯语词识别中的综合与增强
Reem Alaasam, Berat Kurar Barakat, Jihad El-Sana
{"title":"Synthesizing versus Augmentation for Arabic Word Recognition with Convolutional Neural Networks","authors":"Reem Alaasam, Berat Kurar Barakat, Jihad El-Sana","doi":"10.1109/ASAR.2018.8480189","DOIUrl":"https://doi.org/10.1109/ASAR.2018.8480189","url":null,"abstract":"In this paper, we present a sub-word recognition method for historical Arabic manuscripts, using convolutional neural networks. We investigate the benefit of extending training set with synthetically created samples in comparison to augmentation. We show that annotating around ten pages of a manuscript and extending it, is sufficient for successful sub-word recognition in the whole manuscript. In addition, we show the contribution of using different combinations of training sets and compare their sub-word recognition performance in the whole manuscript.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128261826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An Arabic Word Similarity Measure for Semantic Conversational Agents 语义会话代理的阿拉伯语词相似度度量
Z. Noori, Keeley A. Crockett, Z. Bandar, Mohammed Al-Mousa
{"title":"An Arabic Word Similarity Measure for Semantic Conversational Agents","authors":"Z. Noori, Keeley A. Crockett, Z. Bandar, Mohammed Al-Mousa","doi":"10.1109/ASAR.2018.8480252","DOIUrl":"https://doi.org/10.1109/ASAR.2018.8480252","url":null,"abstract":"Word similarity measures are used to measure the semantic relatedness between two words. Whereas traditional English measures exist, relatively little research has been undertaken in developing such measures for Modern Standard Arabic largely due to the linguistic challenges of the language. Domain coverage is also an issue when looking to select the best measure for incorporation into a semantic conversational agent. The information source used within the measure should be general yet capable of dealing with domain specific language to ensure robust and appropriate responses. This paper proposes a word similarity measure that utilises the length, and depth of the words from within a domain specific lexical tree that is used as the information source. The measure is compared with an existing Arabic word similarity measure through evaluation on a generic published dataset and the results show the new measure gives high correlation with human ratings.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129728374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Binarization Free Layout Analysis for Arabic Historical Documents Using Fully Convolutional Networks 使用全卷积网络的阿拉伯历史文献二值化自由布局分析
Berat Kurar Barakat, Jihad El-Sana
{"title":"Binarization Free Layout Analysis for Arabic Historical Documents Using Fully Convolutional Networks","authors":"Berat Kurar Barakat, Jihad El-Sana","doi":"10.1109/ASAR.2018.8480333","DOIUrl":"https://doi.org/10.1109/ASAR.2018.8480333","url":null,"abstract":"We present a Fully Convolutional Network based method for layout analysis of non-binarized historical Arabic manuscripts. The document image is segmented into main text and side text regions by dense pixel prediction. Convolutional part of the network can learn useful features from the non-binarized document images and is robust to degradation and uncontrained layouts. We have evaluated the proposed method on a private dataset containing challenging historical Arabic manuscripts to demonstrate it effectiveness.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"218 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114090168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Adoptive Thresholding and Geometric Features based Physical Layout Analysis of Scanned Arabic Books 基于自适应阈值法和几何特征的阿拉伯文扫描图书物理布局分析
Maitham A. Al-Dobais, F. Alrasheed, Ghazanfar Latif, Loay Alzubaidi
{"title":"Adoptive Thresholding and Geometric Features based Physical Layout Analysis of Scanned Arabic Books","authors":"Maitham A. Al-Dobais, F. Alrasheed, Ghazanfar Latif, Loay Alzubaidi","doi":"10.1109/ASAR.2018.8480378","DOIUrl":"https://doi.org/10.1109/ASAR.2018.8480378","url":null,"abstract":"In the digital age, developing an automated system to convert old printed books into digital form is a challenging task. In this paper we propose a novel technique for the recognition of Arabic scanned documents both with normal and complex layouts. The proposed algorithm is based on the local adaptive thresholding and geometric features which according to the author’s knowledge is the first time it is applied to Arabic document image recognition based on the Physical Layout Analysis (PLA). The proposed method was applied to dataset consisting of 90 images collected from 700 books from various publishers and contains a total of 1112 zones; text zone, image zone, and graphic zone. The proposed algorithm achieved promising results with overall average recognition of 86.71% for Text and Image block regions for all three sets. The proposed novel algorithm outperforms the techniques mentioned in previous literature.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130220674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信