Intelligent document processing system for conference article

Chun-Ming Tsai
{"title":"Intelligent document processing system for conference article","authors":"Chun-Ming Tsai","doi":"10.1109/ICMLC.2012.6359665","DOIUrl":null,"url":null,"abstract":"The conventional document processing systems include document analysis (DA), document classification, and document understanding. These systems are step by step. If the results in the previous step are improper, the current step will produce improper results. Furthermore, the binarization methods in DA to threshold an A4-sized color image are inefficient because they scan the entire image at least once. The block segmentation methods in DA to segment an A4-sized binary image are inefficient since they scan the entire image at least twice. The layout analysis methods in DA are also inefficient. They use global and local analysis and scan the entire image at least once. In this article, an intelligent, efficient, and effective document processing system is proposed to solve the abovementioned problems. The proposed method includes document binarization and mixed-based layout analysis. The binarization method only scans the border image. The mixed-based layout analysis mixed uses block segmentation and classification. The block segmentation only scans the background image. The block classification uses background gap and writing format to classify blocks. Experimental results show that the performance of the proposed method is better than FineReader 11.0 in visual measurement.","PeriodicalId":128006,"journal":{"name":"2012 International Conference on Machine Learning and Cybernetics","volume":"46 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Machine Learning and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC.2012.6359665","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The conventional document processing systems include document analysis (DA), document classification, and document understanding. These systems are step by step. If the results in the previous step are improper, the current step will produce improper results. Furthermore, the binarization methods in DA to threshold an A4-sized color image are inefficient because they scan the entire image at least once. The block segmentation methods in DA to segment an A4-sized binary image are inefficient since they scan the entire image at least twice. The layout analysis methods in DA are also inefficient. They use global and local analysis and scan the entire image at least once. In this article, an intelligent, efficient, and effective document processing system is proposed to solve the abovementioned problems. The proposed method includes document binarization and mixed-based layout analysis. The binarization method only scans the border image. The mixed-based layout analysis mixed uses block segmentation and classification. The block segmentation only scans the background image. The block classification uses background gap and writing format to classify blocks. Experimental results show that the performance of the proposed method is better than FineReader 11.0 in visual measurement.
会议文章智能文件处理系统
传统的文档处理系统包括文档分析(DA)、文档分类和文档理解。这些系统是循序渐进的。如果前一步的结果不正确,则当前步骤将产生不正确的结果。此外,DA中的二值化方法对a4大小的彩色图像进行阈值处理是低效的,因为它们至少扫描整个图像一次。数据处理中的块分割方法对a4大小的二值图像进行分割,其效率低下,因为它们至少扫描整个图像两次。数据分析中的布局分析方法效率低下。他们使用全局和局部分析,并扫描整个图像至少一次。本文提出了一种智能、高效、有效的文档处理系统来解决上述问题。该方法包括文档二值化和基于混合的布局分析。二值化方法只扫描边缘图像。基于混合的布局分析混合使用了块分割和分类。分块分割只扫描背景图像。块分类采用背景间隙和写入格式对块进行分类。实验结果表明,该方法在视觉测量方面优于FineReader 11.0。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信