印刷电路板图像信息检索的文本识别

Wei Li, Stefan Neullens, Matthias Breier, Marcel Bosling, T. Pretz, D. Merhof
{"title":"印刷电路板图像信息检索的文本识别","authors":"Wei Li, Stefan Neullens, Matthias Breier, Marcel Bosling, T. Pretz, D. Merhof","doi":"10.1109/IECON.2014.7049016","DOIUrl":null,"url":null,"abstract":"In order to achieve an efficient and environment-friendly recycling of printed circuit boards (PCBs), a comprehensive analysis of their material composition is essential. Besides sophisticated chemical and physical methods for a direct material analysis, an indirect method based on information retrieval provides a less costly and more efficient alternative. During the process of information retrieval, PCBs and their components need to be recognized based on their appearance and the corresponding text information. Their material composition is then available through a pre-established database. Therefore, a practical text recognition is necessary for a successful data analysis prior to PCB recycling. Our paper is focusing on two key aspects of text recognition: binarization and final recognition of text objects using optical character recognition (OCR) engines. For binarization of text contents, a novel local thresholding method using an adaptive window size along with background estimation is presented. Several state-of-the-art algorithms and the proposed method were evaluated for comparing their binarization performance on text objects in PCB images. With respect to a data set containing manually created references, our novel method provides superior results. Furthermore, in contrast to previous work on text recognition, an additional evaluation of available open source OCR engines was conducted to asses technical limitations of OCR applications. We show that the quality of text recognition can be significantly improved if the binarization approach accounts for these technical limitations of OCR software. The presented method and results are expected to provide improved OCR performance also in other applications.","PeriodicalId":228897,"journal":{"name":"IECON 2014 - 40th Annual Conference of the IEEE Industrial Electronics Society","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"Text recognition for information retrieval in images of printed circuit boards\",\"authors\":\"Wei Li, Stefan Neullens, Matthias Breier, Marcel Bosling, T. Pretz, D. Merhof\",\"doi\":\"10.1109/IECON.2014.7049016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In order to achieve an efficient and environment-friendly recycling of printed circuit boards (PCBs), a comprehensive analysis of their material composition is essential. Besides sophisticated chemical and physical methods for a direct material analysis, an indirect method based on information retrieval provides a less costly and more efficient alternative. During the process of information retrieval, PCBs and their components need to be recognized based on their appearance and the corresponding text information. Their material composition is then available through a pre-established database. Therefore, a practical text recognition is necessary for a successful data analysis prior to PCB recycling. Our paper is focusing on two key aspects of text recognition: binarization and final recognition of text objects using optical character recognition (OCR) engines. For binarization of text contents, a novel local thresholding method using an adaptive window size along with background estimation is presented. Several state-of-the-art algorithms and the proposed method were evaluated for comparing their binarization performance on text objects in PCB images. With respect to a data set containing manually created references, our novel method provides superior results. Furthermore, in contrast to previous work on text recognition, an additional evaluation of available open source OCR engines was conducted to asses technical limitations of OCR applications. We show that the quality of text recognition can be significantly improved if the binarization approach accounts for these technical limitations of OCR software. The presented method and results are expected to provide improved OCR performance also in other applications.\",\"PeriodicalId\":228897,\"journal\":{\"name\":\"IECON 2014 - 40th Annual Conference of the IEEE Industrial Electronics Society\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IECON 2014 - 40th Annual Conference of the IEEE Industrial Electronics Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IECON.2014.7049016\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IECON 2014 - 40th Annual Conference of the IEEE Industrial Electronics Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IECON.2014.7049016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24

摘要

为了实现印刷电路板(pcb)的高效和环保回收,对其材料成分进行全面分析是必不可少的。除了复杂的化学和物理方法用于直接材料分析之外,基于信息检索的间接方法提供了一种成本更低、效率更高的替代方法。在信息检索过程中,pcb及其组件需要根据其外观和相应的文本信息进行识别。然后通过预先建立的数据库可以获得它们的材料组成。因此,在PCB回收之前,一个实用的文本识别对于成功的数据分析是必要的。本文主要关注文本识别的两个关键方面:二值化和使用光学字符识别(OCR)引擎对文本对象进行最终识别。针对文本内容的二值化问题,提出了一种基于自适应窗口大小和背景估计的局部阈值化方法。对几种最先进的算法和提出的方法进行了评估,比较了它们对PCB图像中文本对象的二值化性能。对于包含手动创建引用的数据集,我们的新方法提供了更好的结果。此外,与之前在文本识别方面的工作相比,对可用的开源OCR引擎进行了额外的评估,以评估OCR应用程序的技术局限性。我们表明,如果二值化方法考虑到OCR软件的这些技术限制,则文本识别的质量可以显着提高。所提出的方法和结果有望在其他应用中提供改进的OCR性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Text recognition for information retrieval in images of printed circuit boards
In order to achieve an efficient and environment-friendly recycling of printed circuit boards (PCBs), a comprehensive analysis of their material composition is essential. Besides sophisticated chemical and physical methods for a direct material analysis, an indirect method based on information retrieval provides a less costly and more efficient alternative. During the process of information retrieval, PCBs and their components need to be recognized based on their appearance and the corresponding text information. Their material composition is then available through a pre-established database. Therefore, a practical text recognition is necessary for a successful data analysis prior to PCB recycling. Our paper is focusing on two key aspects of text recognition: binarization and final recognition of text objects using optical character recognition (OCR) engines. For binarization of text contents, a novel local thresholding method using an adaptive window size along with background estimation is presented. Several state-of-the-art algorithms and the proposed method were evaluated for comparing their binarization performance on text objects in PCB images. With respect to a data set containing manually created references, our novel method provides superior results. Furthermore, in contrast to previous work on text recognition, an additional evaluation of available open source OCR engines was conducted to asses technical limitations of OCR applications. We show that the quality of text recognition can be significantly improved if the binarization approach accounts for these technical limitations of OCR software. The presented method and results are expected to provide improved OCR performance also in other applications.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信