Rong Zhang, Changlong Wang, Siyun Bi, Qibin Fu, Xingyu Li, Tingting Gan
{"title":"Knowledge Graph Construction from Tables in Chinese Electric Power PDF Documents","authors":"Rong Zhang, Changlong Wang, Siyun Bi, Qibin Fu, Xingyu Li, Tingting Gan","doi":"10.1145/3603781.3603873","DOIUrl":null,"url":null,"abstract":"The PDF documents of the electric power standard contain a large number of tables, and the main purpose of these table layouts application is to show people more intuitive information. However, the data in the tables is not easily processed by computers and the value of the data is difficult to be exploited. In this paper, an approach is proposed to convert tabular data into RDF knowledge graphs for the power standard PDF documents. Firstly, a variety of table processing techniques are used to extract tabular data from PDF. Then tabular data is normalized and transformed into RDF data. Finally, an electricity domain ontology is constructed and the tabular data is mapped to ontology classes through table interpretation. Extensive experiments are conducted to obtain a structured RDF dataset related to electricity standards, which provides the necessary data support for intelligent knowledge services in the electricity industry.","PeriodicalId":391180,"journal":{"name":"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3603781.3603873","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The PDF documents of the electric power standard contain a large number of tables, and the main purpose of these table layouts application is to show people more intuitive information. However, the data in the tables is not easily processed by computers and the value of the data is difficult to be exploited. In this paper, an approach is proposed to convert tabular data into RDF knowledge graphs for the power standard PDF documents. Firstly, a variety of table processing techniques are used to extract tabular data from PDF. Then tabular data is normalized and transformed into RDF data. Finally, an electricity domain ontology is constructed and the tabular data is mapped to ontology classes through table interpretation. Extensive experiments are conducted to obtain a structured RDF dataset related to electricity standards, which provides the necessary data support for intelligent knowledge services in the electricity industry.