C. Strouthopoulos, N. Papamarkos, A. Atsalakis, C. Chamzas
{"title":"彩色文档中的文本识别","authors":"C. Strouthopoulos, N. Papamarkos, A. Atsalakis, C. Chamzas","doi":"10.1109/ISPA.2003.1296366","DOIUrl":null,"url":null,"abstract":"In complex color documents, text, drawings and graphics are appeared with millions of different colors. In many cases, text regions are overlaid onto drawings or graphics. In this paper, a new method is proposed to automatically detect and extract text in mixed type color documents. The proposed method is based on a combination of an adaptive color reduction (ACR) technique and a page layout analysis (PLA) approach. The ACR technique is used to obtain the optimal number of colors. Then, image is split to separable binary images, each one corresponding to every principal color. The PLA technique is applied independently to each one of the color plains and identifies the text regions. A merging procedure is applied in the final stage to merge the text regions derived from the color plains and to produce the final document.","PeriodicalId":218932,"journal":{"name":"3rd International Symposium on Image and Signal Processing and Analysis, 2003. ISPA 2003. Proceedings of the","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Text identification in color documents\",\"authors\":\"C. Strouthopoulos, N. Papamarkos, A. Atsalakis, C. Chamzas\",\"doi\":\"10.1109/ISPA.2003.1296366\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In complex color documents, text, drawings and graphics are appeared with millions of different colors. In many cases, text regions are overlaid onto drawings or graphics. In this paper, a new method is proposed to automatically detect and extract text in mixed type color documents. The proposed method is based on a combination of an adaptive color reduction (ACR) technique and a page layout analysis (PLA) approach. The ACR technique is used to obtain the optimal number of colors. Then, image is split to separable binary images, each one corresponding to every principal color. The PLA technique is applied independently to each one of the color plains and identifies the text regions. A merging procedure is applied in the final stage to merge the text regions derived from the color plains and to produce the final document.\",\"PeriodicalId\":218932,\"journal\":{\"name\":\"3rd International Symposium on Image and Signal Processing and Analysis, 2003. ISPA 2003. Proceedings of the\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"3rd International Symposium on Image and Signal Processing and Analysis, 2003. ISPA 2003. Proceedings of the\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPA.2003.1296366\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"3rd International Symposium on Image and Signal Processing and Analysis, 2003. ISPA 2003. Proceedings of the","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPA.2003.1296366","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In complex color documents, text, drawings and graphics are appeared with millions of different colors. In many cases, text regions are overlaid onto drawings or graphics. In this paper, a new method is proposed to automatically detect and extract text in mixed type color documents. The proposed method is based on a combination of an adaptive color reduction (ACR) technique and a page layout analysis (PLA) approach. The ACR technique is used to obtain the optimal number of colors. Then, image is split to separable binary images, each one corresponding to every principal color. The PLA technique is applied independently to each one of the color plains and identifies the text regions. A merging procedure is applied in the final stage to merge the text regions derived from the color plains and to produce the final document.