{"title":"OCR和Levenshtein距离作为识别文件图像质量精度的度量","authors":"Kreshnik Vukatana","doi":"10.1109/ICECET55527.2022.9872824","DOIUrl":null,"url":null,"abstract":"Optical Character Recognition (OCR) is a technology used to distinguish printed or handwritten text characters inside digital images. The areas that it is applied differ from text editors where the scanned images are converted to text, to text recognition where license plates are identified through a camera. The proposed model in this paper uses this technology with the integration of a text-matching algorithm to decide if an image has good quality and clear readability. The sample dataset is based on identification documents, such as a health insurance card. The main objective of the designed model is to enhance the pre-processing phase of dataset creation used from the training models for document classification based on artificial intelligence. It can be used in the pre-processing phase as a boundary for the processed images, to clean the input data from low quality images.","PeriodicalId":249012,"journal":{"name":"2022 International Conference on Electrical, Computer and Energy Technologies (ICECET)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"OCR and Levenshtein distance as a measure of image quality accuracy for identification documents\",\"authors\":\"Kreshnik Vukatana\",\"doi\":\"10.1109/ICECET55527.2022.9872824\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Optical Character Recognition (OCR) is a technology used to distinguish printed or handwritten text characters inside digital images. The areas that it is applied differ from text editors where the scanned images are converted to text, to text recognition where license plates are identified through a camera. The proposed model in this paper uses this technology with the integration of a text-matching algorithm to decide if an image has good quality and clear readability. The sample dataset is based on identification documents, such as a health insurance card. The main objective of the designed model is to enhance the pre-processing phase of dataset creation used from the training models for document classification based on artificial intelligence. It can be used in the pre-processing phase as a boundary for the processed images, to clean the input data from low quality images.\",\"PeriodicalId\":249012,\"journal\":{\"name\":\"2022 International Conference on Electrical, Computer and Energy Technologies (ICECET)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Electrical, Computer and Energy Technologies (ICECET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICECET55527.2022.9872824\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Electrical, Computer and Energy Technologies (ICECET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECET55527.2022.9872824","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
OCR and Levenshtein distance as a measure of image quality accuracy for identification documents
Optical Character Recognition (OCR) is a technology used to distinguish printed or handwritten text characters inside digital images. The areas that it is applied differ from text editors where the scanned images are converted to text, to text recognition where license plates are identified through a camera. The proposed model in this paper uses this technology with the integration of a text-matching algorithm to decide if an image has good quality and clear readability. The sample dataset is based on identification documents, such as a health insurance card. The main objective of the designed model is to enhance the pre-processing phase of dataset creation used from the training models for document classification based on artificial intelligence. It can be used in the pre-processing phase as a boundary for the processed images, to clean the input data from low quality images.