一个数学表达式识别系统的设计

Proceedings of 3rd International Conference on Document Analysis and Recognition Pub Date : 1995-08-14 DOI:10.1109/ICDAR.1995.602097

Hsi-Jian Lee, Jiumn-Shine Wang

{"title":"一个数学表达式识别系统的设计","authors":"Hsi-Jian Lee, Jiumn-Shine Wang","doi":"10.1109/ICDAR.1995.602097","DOIUrl":null,"url":null,"abstract":"We present a system to segment and recognize texts and mathematical expressions in a document. The system can be divided into six stages: page segmentation and labeling, character segmentation, feature extraction, character recognition, expression formation, and error correction and expression extraction. In expression formation, we build a symbol relation tree for each text line to represent the relationships among the symbols in the text line. Some heuristic rules based on the primitive tokens are used to correct the recognition errors in a text line. We extract all mathematical expressions according to some basic expression forms. Our database consists of 190 symbols in the current stage. The average recognition rate is about 96.16%.","PeriodicalId":273519,"journal":{"name":"Proceedings of 3rd International Conference on Document Analysis and Recognition","volume":"41 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"67","resultStr":"{\"title\":\"Design of a mathematical expression recognition system\",\"authors\":\"Hsi-Jian Lee, Jiumn-Shine Wang\",\"doi\":\"10.1109/ICDAR.1995.602097\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a system to segment and recognize texts and mathematical expressions in a document. The system can be divided into six stages: page segmentation and labeling, character segmentation, feature extraction, character recognition, expression formation, and error correction and expression extraction. In expression formation, we build a symbol relation tree for each text line to represent the relationships among the symbols in the text line. Some heuristic rules based on the primitive tokens are used to correct the recognition errors in a text line. We extract all mathematical expressions according to some basic expression forms. Our database consists of 190 symbols in the current stage. The average recognition rate is about 96.16%.\",\"PeriodicalId\":273519,\"journal\":{\"name\":\"Proceedings of 3rd International Conference on Document Analysis and Recognition\",\"volume\":\"41 1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1995-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"67\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of 3rd International Conference on Document Analysis and Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.1995.602097\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 3rd International Conference on Document Analysis and Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.1995.602097","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 67

摘要

我们提出了一个对文本和数学表达式进行分割和识别的系统。该系统可分为6个阶段:页面分割与标注、字符分割、特征提取、字符识别、表达式形成、纠错与表达式提取。在表达式形成方面，我们为每条文本行构建符号关系树，表示文本行中各符号之间的关系。利用基于原语标记的启发式规则来纠正文本行中的识别错误。我们根据一些基本的表达式形式提取所有的数学表达式。我们的数据库目前有190个符号。平均识别率约为96.16%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Design of a mathematical expression recognition system

We present a system to segment and recognize texts and mathematical expressions in a document. The system can be divided into six stages: page segmentation and labeling, character segmentation, feature extraction, character recognition, expression formation, and error correction and expression extraction. In expression formation, we build a symbol relation tree for each text line to represent the relationships among the symbols in the text line. Some heuristic rules based on the primitive tokens are used to correct the recognition errors in a text line. We extract all mathematical expressions according to some basic expression forms. Our database consists of 190 symbols in the current stage. The average recognition rate is about 96.16%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of 3rd International Conference on Document Analysis and Recognition

自引率

0.00%

发文量