基于多线索启发式的表检测算法

G. V. S. S. K. Naganjaneyulu, N. V. Sathwik, A. V. Narasimhadhan
{"title":"基于多线索启发式的表检测算法","authors":"G. V. S. S. K. Naganjaneyulu, N. V. Sathwik, A. V. Narasimhadhan","doi":"10.1109/TENCON.2016.7848210","DOIUrl":null,"url":null,"abstract":"Research in the field of document analysis and document recognition experienced reverent growth in the past decade as automation of the office document became essential for daily life. Text in documents can take different forms like hand written text, printed text, headings signatures, tables and graphics. Extraction of tables plays a crucial role in layout analysis, and retaining the important information present in tables. In this work, a multi clue heuristic based table detection algorithm using hough lines and corner harris corner is proposed. Hough lines and harris corner points are extracted from the document in two parallel process. The clues extracted from both the process are matched using nearest neighbor framework to yield tables from the documents. The proposed algorithm is a simple paradigm for extraction of tables that are formed by lines. The performance of the proposed algorithm is tested on different types of documents that contain tables to observe an accuracy of 89.7 %.","PeriodicalId":246458,"journal":{"name":"2016 IEEE Region 10 Conference (TENCON)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A multi clue heuristic based algorithm for table detection\",\"authors\":\"G. V. S. S. K. Naganjaneyulu, N. V. Sathwik, A. V. Narasimhadhan\",\"doi\":\"10.1109/TENCON.2016.7848210\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Research in the field of document analysis and document recognition experienced reverent growth in the past decade as automation of the office document became essential for daily life. Text in documents can take different forms like hand written text, printed text, headings signatures, tables and graphics. Extraction of tables plays a crucial role in layout analysis, and retaining the important information present in tables. In this work, a multi clue heuristic based table detection algorithm using hough lines and corner harris corner is proposed. Hough lines and harris corner points are extracted from the document in two parallel process. The clues extracted from both the process are matched using nearest neighbor framework to yield tables from the documents. The proposed algorithm is a simple paradigm for extraction of tables that are formed by lines. The performance of the proposed algorithm is tested on different types of documents that contain tables to observe an accuracy of 89.7 %.\",\"PeriodicalId\":246458,\"journal\":{\"name\":\"2016 IEEE Region 10 Conference (TENCON)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE Region 10 Conference (TENCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TENCON.2016.7848210\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Region 10 Conference (TENCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENCON.2016.7848210","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

在过去的十年中,随着办公文档的自动化成为日常生活的必需品,文档分析和文档识别领域的研究经历了迅猛的增长。文档中的文本可以采用不同的形式,如手写文本、打印文本、标题、签名、表格和图形。表的提取在布局分析中起着至关重要的作用,并保留表中存在的重要信息。本文提出了一种基于哈夫线和角哈里斯角的多线索启发式表检测算法。霍夫线和哈里斯角点通过两个并行过程从文档中提取。从两个过程中提取的线索使用最近邻框架进行匹配,从而从文档中生成表。所提出的算法是一个简单的范例,用于提取由线组成的表。在包含表的不同类型文档上测试了该算法的性能,准确率达到89.7%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A multi clue heuristic based algorithm for table detection
Research in the field of document analysis and document recognition experienced reverent growth in the past decade as automation of the office document became essential for daily life. Text in documents can take different forms like hand written text, printed text, headings signatures, tables and graphics. Extraction of tables plays a crucial role in layout analysis, and retaining the important information present in tables. In this work, a multi clue heuristic based table detection algorithm using hough lines and corner harris corner is proposed. Hough lines and harris corner points are extracted from the document in two parallel process. The clues extracted from both the process are matched using nearest neighbor framework to yield tables from the documents. The proposed algorithm is a simple paradigm for extraction of tables that are formed by lines. The performance of the proposed algorithm is tested on different types of documents that contain tables to observe an accuracy of 89.7 %.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信