G. V. S. S. K. Naganjaneyulu, N. V. Sathwik, A. V. Narasimhadhan
{"title":"基于多线索启发式的表检测算法","authors":"G. V. S. S. K. Naganjaneyulu, N. V. Sathwik, A. V. Narasimhadhan","doi":"10.1109/TENCON.2016.7848210","DOIUrl":null,"url":null,"abstract":"Research in the field of document analysis and document recognition experienced reverent growth in the past decade as automation of the office document became essential for daily life. Text in documents can take different forms like hand written text, printed text, headings signatures, tables and graphics. Extraction of tables plays a crucial role in layout analysis, and retaining the important information present in tables. In this work, a multi clue heuristic based table detection algorithm using hough lines and corner harris corner is proposed. Hough lines and harris corner points are extracted from the document in two parallel process. The clues extracted from both the process are matched using nearest neighbor framework to yield tables from the documents. The proposed algorithm is a simple paradigm for extraction of tables that are formed by lines. The performance of the proposed algorithm is tested on different types of documents that contain tables to observe an accuracy of 89.7 %.","PeriodicalId":246458,"journal":{"name":"2016 IEEE Region 10 Conference (TENCON)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A multi clue heuristic based algorithm for table detection\",\"authors\":\"G. V. S. S. K. Naganjaneyulu, N. V. Sathwik, A. V. Narasimhadhan\",\"doi\":\"10.1109/TENCON.2016.7848210\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Research in the field of document analysis and document recognition experienced reverent growth in the past decade as automation of the office document became essential for daily life. Text in documents can take different forms like hand written text, printed text, headings signatures, tables and graphics. Extraction of tables plays a crucial role in layout analysis, and retaining the important information present in tables. In this work, a multi clue heuristic based table detection algorithm using hough lines and corner harris corner is proposed. Hough lines and harris corner points are extracted from the document in two parallel process. The clues extracted from both the process are matched using nearest neighbor framework to yield tables from the documents. The proposed algorithm is a simple paradigm for extraction of tables that are formed by lines. The performance of the proposed algorithm is tested on different types of documents that contain tables to observe an accuracy of 89.7 %.\",\"PeriodicalId\":246458,\"journal\":{\"name\":\"2016 IEEE Region 10 Conference (TENCON)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE Region 10 Conference (TENCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TENCON.2016.7848210\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Region 10 Conference (TENCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENCON.2016.7848210","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A multi clue heuristic based algorithm for table detection
Research in the field of document analysis and document recognition experienced reverent growth in the past decade as automation of the office document became essential for daily life. Text in documents can take different forms like hand written text, printed text, headings signatures, tables and graphics. Extraction of tables plays a crucial role in layout analysis, and retaining the important information present in tables. In this work, a multi clue heuristic based table detection algorithm using hough lines and corner harris corner is proposed. Hough lines and harris corner points are extracted from the document in two parallel process. The clues extracted from both the process are matched using nearest neighbor framework to yield tables from the documents. The proposed algorithm is a simple paradigm for extraction of tables that are formed by lines. The performance of the proposed algorithm is tested on different types of documents that contain tables to observe an accuracy of 89.7 %.