{"title":"基于叙利亚语(亚述语)和英语或阿拉伯语文档的哈拉里克纹理特征分类","authors":"Basima Z. Yacob","doi":"10.7753/ijcatr0402.1006","DOIUrl":null,"url":null,"abstract":"Script identification is very essential before running an individual OCR system. Automatic language script identification from document images facilitates many important applications such as sorting, transcription of multilingual documents and indexing of large collection of such images, or as a precursor to optical character recognition (OCR), in this paper the characterized are between Syriac and English documents or between Syriac and Arabic documents were the characterized is achieved by extracting Haralick texture Features. it is investigated a texture as a tool for determining the script of document image ,based on the observation that text has a distinct visual texture. Further, K nearest neighbour algorithm is used to classify 300 text blocks into one of the two scripts: Syriac, and English , or Syriac and Arabic based on Haralick texture Features . The script was inserted to the System with different rotation angles between 0º and 135º and the results of recognition were good.","PeriodicalId":104117,"journal":{"name":"International Journal of Computer Applications Technology and Research","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Haralick Texture Features based Syriac(Assyrian) and English or Arabic documents Classification\",\"authors\":\"Basima Z. Yacob\",\"doi\":\"10.7753/ijcatr0402.1006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Script identification is very essential before running an individual OCR system. Automatic language script identification from document images facilitates many important applications such as sorting, transcription of multilingual documents and indexing of large collection of such images, or as a precursor to optical character recognition (OCR), in this paper the characterized are between Syriac and English documents or between Syriac and Arabic documents were the characterized is achieved by extracting Haralick texture Features. it is investigated a texture as a tool for determining the script of document image ,based on the observation that text has a distinct visual texture. Further, K nearest neighbour algorithm is used to classify 300 text blocks into one of the two scripts: Syriac, and English , or Syriac and Arabic based on Haralick texture Features . The script was inserted to the System with different rotation angles between 0º and 135º and the results of recognition were good.\",\"PeriodicalId\":104117,\"journal\":{\"name\":\"International Journal of Computer Applications Technology and Research\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Computer Applications Technology and Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.7753/ijcatr0402.1006\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Applications Technology and Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7753/ijcatr0402.1006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Haralick Texture Features based Syriac(Assyrian) and English or Arabic documents Classification
Script identification is very essential before running an individual OCR system. Automatic language script identification from document images facilitates many important applications such as sorting, transcription of multilingual documents and indexing of large collection of such images, or as a precursor to optical character recognition (OCR), in this paper the characterized are between Syriac and English documents or between Syriac and Arabic documents were the characterized is achieved by extracting Haralick texture Features. it is investigated a texture as a tool for determining the script of document image ,based on the observation that text has a distinct visual texture. Further, K nearest neighbour algorithm is used to classify 300 text blocks into one of the two scripts: Syriac, and English , or Syriac and Arabic based on Haralick texture Features . The script was inserted to the System with different rotation angles between 0º and 135º and the results of recognition were good.