Xilun Chen, Laura Chiticariu, Marina Danilevsky, A. Evfimievski, P. Sen
{"title":"一种理解财务表语义的矩形挖掘方法","authors":"Xilun Chen, Laura Chiticariu, Marina Danilevsky, A. Evfimievski, P. Sen","doi":"10.1109/ICDAR.2017.52","DOIUrl":null,"url":null,"abstract":"Financial statements report crucial information in tables with complex semantic structure, which are desirable, yet challenging, to interpret automatically. For example, in such tables a row of data cells is often explained by the headers of other rows. In a departure from prior art, we propose a rectangle mining framework for understanding complex tables, which considers rectangular regions rather than individual cells or pairs of cells in a table. We instantiate this framework with ReMine, an algorithm for extracting row header semantics of table, and show that it significantly outperforms prior pair-wise classification approaches on two datasets: (i) a set of manually labeled financial tables from multiple companies, and (ii) the ICDAR 2013 Table Competition dataset.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"A Rectangle Mining Method for Understanding the Semantics of Financial Tables\",\"authors\":\"Xilun Chen, Laura Chiticariu, Marina Danilevsky, A. Evfimievski, P. Sen\",\"doi\":\"10.1109/ICDAR.2017.52\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Financial statements report crucial information in tables with complex semantic structure, which are desirable, yet challenging, to interpret automatically. For example, in such tables a row of data cells is often explained by the headers of other rows. In a departure from prior art, we propose a rectangle mining framework for understanding complex tables, which considers rectangular regions rather than individual cells or pairs of cells in a table. We instantiate this framework with ReMine, an algorithm for extracting row header semantics of table, and show that it significantly outperforms prior pair-wise classification approaches on two datasets: (i) a set of manually labeled financial tables from multiple companies, and (ii) the ICDAR 2013 Table Competition dataset.\",\"PeriodicalId\":433676,\"journal\":{\"name\":\"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)\",\"volume\":\"66 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2017.52\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2017.52","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Rectangle Mining Method for Understanding the Semantics of Financial Tables
Financial statements report crucial information in tables with complex semantic structure, which are desirable, yet challenging, to interpret automatically. For example, in such tables a row of data cells is often explained by the headers of other rows. In a departure from prior art, we propose a rectangle mining framework for understanding complex tables, which considers rectangular regions rather than individual cells or pairs of cells in a table. We instantiate this framework with ReMine, an algorithm for extracting row header semantics of table, and show that it significantly outperforms prior pair-wise classification approaches on two datasets: (i) a set of manually labeled financial tables from multiple companies, and (ii) the ICDAR 2013 Table Competition dataset.