Experiments on extracting structural information from paper documents using syntactic pattern analysis

Proceedings of 3rd International Conference on Document Analysis and Recognition Pub Date : 1995-08-14 DOI:10.1109/ICDAR.1995.599039

T. Bayer, H. Walischewski

引用次数: 24

Abstract

Extracting structural information from paper documents supports the daily document processing by, for example, automatically finding index terms, document topics, etc. Knowledge about such components are modeled in a semantic net, which describes geometric properties, spatial relationships, lexical entities as well as lexical relationships. The document model is used to extract the sender, date, recipient, opening and closing formula from a business letter. 181 business letters have been processed, divided into a training set of 20 and the remaining ones for testing. The error rates for the test set range from 0.022 to 0.049 by an average rejection rate of 0.4. Results show that the computational effort can be limited to O(n/sup 2/) given n primitive objects for matching.

查看原文本刊更多论文

基于句法模式分析的纸质文档结构信息提取实验

从纸质文档中提取结构信息支持日常文档处理，例如，自动查找索引术语、文档主题等。关于这些组件的知识在语义网络中建模，语义网络描述了几何属性、空间关系、词汇实体以及词汇关系。文档模型用于从商业信函中提取发件人、日期、收件人、开始和结束公式。已处理181封商务信函，分为训练集20封，其余为测试集。测试集的错误率范围为0.022至0.049，平均拒绝率为0.4。结果表明，在给定n个基本匹配对象的情况下，计算量可以限制在O(n/sup 2/)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of 3rd International Conference on Document Analysis and Recognition

自引率

0.00%

发文量