{"title":"从公共数据构建用于预测分析的知识图谱:预测技术未来时空的案例研究","authors":"Weiwei Duan, Yao-Yi Chiang","doi":"10.1145/3006386.3006388","DOIUrl":null,"url":null,"abstract":"A domain expert can process heterogeneous data to make meaningful interpretations or predictions from the data. For example, by looking at research papers and patent records, an expert can determine the maturity of an emerging technology and predict the geographic location(s) and time (e.g., in a certain year) where and when the technology will be a success. However, this is an expert- and manual-intensive task. This paper presents an end-to-end system that integrates heterogeneous data sources into a knowledge graph in the RDF (Resource Description Framework) format using an ontology. Then the user can easily query the knowledge graph to prepare the required data for different types of predictive analysis tools. We show a case study of predicting the (geographic) center(s) of fuel cell technologies using data collected from public sources to demonstrate the feasibility of our system. The system extracts, cleanses, and augments data from public sources including research papers and patent records. Next, the system uses an ontology-based data integration method to generate knowledge graphs in the RDF format to enable users to switch quickly between machine learning models for predictive analytic tasks. We tested the system using the Support Vector Machine and Multiple Hidden Markov Models and achieved 66.7% and 83.3% accuracy on the city and year levels of spatial and temporal resolutions, respectively.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Building knowledge graph from public data for predictive analysis: a case study on predicting technology future in space and time\",\"authors\":\"Weiwei Duan, Yao-Yi Chiang\",\"doi\":\"10.1145/3006386.3006388\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A domain expert can process heterogeneous data to make meaningful interpretations or predictions from the data. For example, by looking at research papers and patent records, an expert can determine the maturity of an emerging technology and predict the geographic location(s) and time (e.g., in a certain year) where and when the technology will be a success. However, this is an expert- and manual-intensive task. This paper presents an end-to-end system that integrates heterogeneous data sources into a knowledge graph in the RDF (Resource Description Framework) format using an ontology. Then the user can easily query the knowledge graph to prepare the required data for different types of predictive analysis tools. We show a case study of predicting the (geographic) center(s) of fuel cell technologies using data collected from public sources to demonstrate the feasibility of our system. The system extracts, cleanses, and augments data from public sources including research papers and patent records. Next, the system uses an ontology-based data integration method to generate knowledge graphs in the RDF format to enable users to switch quickly between machine learning models for predictive analytic tasks. We tested the system using the Support Vector Machine and Multiple Hidden Markov Models and achieved 66.7% and 83.3% accuracy on the city and year levels of spatial and temporal resolutions, respectively.\",\"PeriodicalId\":416086,\"journal\":{\"name\":\"International Workshop on Analytics for Big Geospatial Data\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Workshop on Analytics for Big Geospatial Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3006386.3006388\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Analytics for Big Geospatial Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3006386.3006388","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Building knowledge graph from public data for predictive analysis: a case study on predicting technology future in space and time
A domain expert can process heterogeneous data to make meaningful interpretations or predictions from the data. For example, by looking at research papers and patent records, an expert can determine the maturity of an emerging technology and predict the geographic location(s) and time (e.g., in a certain year) where and when the technology will be a success. However, this is an expert- and manual-intensive task. This paper presents an end-to-end system that integrates heterogeneous data sources into a knowledge graph in the RDF (Resource Description Framework) format using an ontology. Then the user can easily query the knowledge graph to prepare the required data for different types of predictive analysis tools. We show a case study of predicting the (geographic) center(s) of fuel cell technologies using data collected from public sources to demonstrate the feasibility of our system. The system extracts, cleanses, and augments data from public sources including research papers and patent records. Next, the system uses an ontology-based data integration method to generate knowledge graphs in the RDF format to enable users to switch quickly between machine learning models for predictive analytic tasks. We tested the system using the Support Vector Machine and Multiple Hidden Markov Models and achieved 66.7% and 83.3% accuracy on the city and year levels of spatial and temporal resolutions, respectively.