Lutfieh S. Al-homed, K. M. Jambi, Hassanin M. Al-Barhamtoshy
{"title":"A Novel Dataset for Known and Unknown Ancient Arabic Manuscripts","authors":"Lutfieh S. Al-homed, K. M. Jambi, Hassanin M. Al-Barhamtoshy","doi":"10.1109/ESOLEC54569.2022.10009168","DOIUrl":null,"url":null,"abstract":"This paper presents a new dataset of Ancient Arabic-Islamic Manuscripts to detect unknown manuscripts and classify them from the known manuscripts. Unknown Manuscripts are identified as those that have been affected badly by human or natural forces, such as humidity, temperature, and air pollution, which degraded their quality and missed their identification information, such as the title, author, and date of the manuscripts. Thus, The Known Manuscripts are characterized by having a known title, author, etc. Recognizing the unknown manuscripts is essential to further the analysis process, facilitate information extraction from such degraded manuscripts, enable their indexing, and make them easily accessed and retrieved. The objectives of the constructed dataset are as follows: 1) Collect a set of known and unknown manuscripts of similar forms and highlight the characteristics of the unknown manuscripts. 2) Promote the automatic detection and recognition of unknown manuscripts. 3) Formulate the problem of recognizing unknown manuscripts as a supervised machine-learning problem, and boost this recognition with the advances in machine learning and deep learning techniques. A total of 108 manuscripts were collected, distributed equally by the known and unknown categories. The preliminary results for classifying and recognizing unknown manuscripts showed that using a decision tree classifier achieved an accuracy of 88% in classifying unknown manuscripts.","PeriodicalId":179850,"journal":{"name":"2022 20th International Conference on Language Engineering (ESOLEC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 20th International Conference on Language Engineering (ESOLEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESOLEC54569.2022.10009168","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper presents a new dataset of Ancient Arabic-Islamic Manuscripts to detect unknown manuscripts and classify them from the known manuscripts. Unknown Manuscripts are identified as those that have been affected badly by human or natural forces, such as humidity, temperature, and air pollution, which degraded their quality and missed their identification information, such as the title, author, and date of the manuscripts. Thus, The Known Manuscripts are characterized by having a known title, author, etc. Recognizing the unknown manuscripts is essential to further the analysis process, facilitate information extraction from such degraded manuscripts, enable their indexing, and make them easily accessed and retrieved. The objectives of the constructed dataset are as follows: 1) Collect a set of known and unknown manuscripts of similar forms and highlight the characteristics of the unknown manuscripts. 2) Promote the automatic detection and recognition of unknown manuscripts. 3) Formulate the problem of recognizing unknown manuscripts as a supervised machine-learning problem, and boost this recognition with the advances in machine learning and deep learning techniques. A total of 108 manuscripts were collected, distributed equally by the known and unknown categories. The preliminary results for classifying and recognizing unknown manuscripts showed that using a decision tree classifier achieved an accuracy of 88% in classifying unknown manuscripts.