{"title":"生物表:生物学文献中表语义结构的提取工具","authors":"Daipeng Luo, Jing Peng, Yuhua Fu","doi":"10.1145/3309129.3309139","DOIUrl":null,"url":null,"abstract":"The publication of biological literature increasing year by year. And the important information in biomedical articles may only appear in tables. However, research on information extraction from tables is rare. Nowadays, there are two ways to do table mining. The first way is that researchers convert the document to HTML format, but the performance of conversion is terrible. The second way is that researchers use documents in XML format directly, but the number of XML documents are limited. To solve this problem, we propose Biotable, a tool for mining biological tables in PDF documents. We use the concept of Connected Value to locate the table boundary and locate each cell after converting each page of the PDF into a picture. In the analysis of the table header field, we convert all the heterogeneous table headers into one row. Then we will have better understanding of the semantics of each column. Based on Biotable and the pipeline QTLMiners proposed, we performed a table mining experiment on QTLMiner's dataset. The precision value of the table detection is 98.12% and the recall value of table detection is 93.14%. The recall value of QTL statements is 86.53%.","PeriodicalId":326530,"journal":{"name":"Proceedings of the 5th International Conference on Bioinformatics Research and Applications","volume":"198 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Biotable: A Tool to Extract Semantic Structure of Table in Biology Literature\",\"authors\":\"Daipeng Luo, Jing Peng, Yuhua Fu\",\"doi\":\"10.1145/3309129.3309139\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The publication of biological literature increasing year by year. And the important information in biomedical articles may only appear in tables. However, research on information extraction from tables is rare. Nowadays, there are two ways to do table mining. The first way is that researchers convert the document to HTML format, but the performance of conversion is terrible. The second way is that researchers use documents in XML format directly, but the number of XML documents are limited. To solve this problem, we propose Biotable, a tool for mining biological tables in PDF documents. We use the concept of Connected Value to locate the table boundary and locate each cell after converting each page of the PDF into a picture. In the analysis of the table header field, we convert all the heterogeneous table headers into one row. Then we will have better understanding of the semantics of each column. Based on Biotable and the pipeline QTLMiners proposed, we performed a table mining experiment on QTLMiner's dataset. The precision value of the table detection is 98.12% and the recall value of table detection is 93.14%. The recall value of QTL statements is 86.53%.\",\"PeriodicalId\":326530,\"journal\":{\"name\":\"Proceedings of the 5th International Conference on Bioinformatics Research and Applications\",\"volume\":\"198 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 5th International Conference on Bioinformatics Research and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3309129.3309139\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th International Conference on Bioinformatics Research and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3309129.3309139","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Biotable: A Tool to Extract Semantic Structure of Table in Biology Literature
The publication of biological literature increasing year by year. And the important information in biomedical articles may only appear in tables. However, research on information extraction from tables is rare. Nowadays, there are two ways to do table mining. The first way is that researchers convert the document to HTML format, but the performance of conversion is terrible. The second way is that researchers use documents in XML format directly, but the number of XML documents are limited. To solve this problem, we propose Biotable, a tool for mining biological tables in PDF documents. We use the concept of Connected Value to locate the table boundary and locate each cell after converting each page of the PDF into a picture. In the analysis of the table header field, we convert all the heterogeneous table headers into one row. Then we will have better understanding of the semantics of each column. Based on Biotable and the pipeline QTLMiners proposed, we performed a table mining experiment on QTLMiner's dataset. The precision value of the table detection is 98.12% and the recall value of table detection is 93.14%. The recall value of QTL statements is 86.53%.