{"title":"古代退化文档图像背景-文本-非文本分离的新方法","authors":"D. Asatryan, Grigor Sazhumyan, Lusine Aznauryan","doi":"10.1109/CSITECHNOL.2017.8312161","DOIUrl":null,"url":null,"abstract":"Nowadays lots of handwritten and printed ancient documents needs to perform to digitized form for automated processing and analysis. In this paper, an approach to background-text-nontext separation procedure based on differences of presented in a document image objects sizes which can be obtained by binarization and segmentation algorithms, is proposed. After binarization by a proper method, it is segmented and the distribution of segments sizes is obtained. It is assumed that the three types of objects presented in an image have significantly different sizes; therefore, the problem of separation comes to discrimination of the set of segments into three groups. The thresholds for the separation of these groups can be found by minimizing the intrasample variation which is used in discriminant analysis. Some examples of images from the Matenadaran collection are considered and the separated parts of the image are illustrated and interpreted.","PeriodicalId":332371,"journal":{"name":"2017 Computer Science and Information Technologies (CSIT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Novel approach to background-text-nontext separation in ancient degraded document images\",\"authors\":\"D. Asatryan, Grigor Sazhumyan, Lusine Aznauryan\",\"doi\":\"10.1109/CSITECHNOL.2017.8312161\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays lots of handwritten and printed ancient documents needs to perform to digitized form for automated processing and analysis. In this paper, an approach to background-text-nontext separation procedure based on differences of presented in a document image objects sizes which can be obtained by binarization and segmentation algorithms, is proposed. After binarization by a proper method, it is segmented and the distribution of segments sizes is obtained. It is assumed that the three types of objects presented in an image have significantly different sizes; therefore, the problem of separation comes to discrimination of the set of segments into three groups. The thresholds for the separation of these groups can be found by minimizing the intrasample variation which is used in discriminant analysis. Some examples of images from the Matenadaran collection are considered and the separated parts of the image are illustrated and interpreted.\",\"PeriodicalId\":332371,\"journal\":{\"name\":\"2017 Computer Science and Information Technologies (CSIT)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 Computer Science and Information Technologies (CSIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSITECHNOL.2017.8312161\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Computer Science and Information Technologies (CSIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSITECHNOL.2017.8312161","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Novel approach to background-text-nontext separation in ancient degraded document images
Nowadays lots of handwritten and printed ancient documents needs to perform to digitized form for automated processing and analysis. In this paper, an approach to background-text-nontext separation procedure based on differences of presented in a document image objects sizes which can be obtained by binarization and segmentation algorithms, is proposed. After binarization by a proper method, it is segmented and the distribution of segments sizes is obtained. It is assumed that the three types of objects presented in an image have significantly different sizes; therefore, the problem of separation comes to discrimination of the set of segments into three groups. The thresholds for the separation of these groups can be found by minimizing the intrasample variation which is used in discriminant analysis. Some examples of images from the Matenadaran collection are considered and the separated parts of the image are illustrated and interpreted.