{"title":"Model-Based Tabular Structure Detection and Recognition in Noisy Handwritten Documents","authors":"Jin Chen, D. Lopresti","doi":"10.1109/ICFHR.2012.233","DOIUrl":null,"url":null,"abstract":"Tabular structure detection and recognition can be a valuable step in the analysis of unstructured documents. The noisy handwritten documents we try to analyze may contain pre-printed rulings as the substrate, hand-drawn rulings, machine-printed text, handwritten text, and signatures, in addition to the tabular structures which we wish to decompose into basic cells, rows, and columns. Although work has been done to machine-printed documents, noisy handwritten documents may require modified and/or new techniques. In this work, we try to detect and decompose tabular structures into 2-D grids of table cells simultaneously. First, we detect \"key points\" that help determine the physical and logical structure of tables. Then, we make use of the 2-D grid assumption to build grids of key points. Finally, we extract structural features for the Min-Cut/Max-Flow algorithm to recognize tabular structures. Experiments on 22 tables which contain 584 table cells show a cell precision of 100% and a cell recall of 93.3%.","PeriodicalId":291062,"journal":{"name":"2012 International Conference on Frontiers in Handwriting Recognition","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Frontiers in Handwriting Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFHR.2012.233","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Tabular structure detection and recognition can be a valuable step in the analysis of unstructured documents. The noisy handwritten documents we try to analyze may contain pre-printed rulings as the substrate, hand-drawn rulings, machine-printed text, handwritten text, and signatures, in addition to the tabular structures which we wish to decompose into basic cells, rows, and columns. Although work has been done to machine-printed documents, noisy handwritten documents may require modified and/or new techniques. In this work, we try to detect and decompose tabular structures into 2-D grids of table cells simultaneously. First, we detect "key points" that help determine the physical and logical structure of tables. Then, we make use of the 2-D grid assumption to build grids of key points. Finally, we extract structural features for the Min-Cut/Max-Flow algorithm to recognize tabular structures. Experiments on 22 tables which contain 584 table cells show a cell precision of 100% and a cell recall of 93.3%.