{"title":"从HTML表中提取关联数据","authors":"Ahmed Ktob, Zhoujun Li, D. Bouchiha","doi":"10.1109/CIC.2017.00018","DOIUrl":null,"url":null,"abstract":"The web plays a crucial role in our daily life. Its openness allows users to access data around the clock. Recently, data has become more exploitable by machines due to the newly introduced mechanism of linked data, which improves the quality of published data on the web dramatically. Therefore, we have attempted to benefit from the investment, regarding data, which already exist on the web, particularly web applications, to generate linked data. To achieve this, we suggested a set of transformation rules to extract data from HTML tables then convert them into RDF (Resource Description Framework) triples. Our hypothesis is based on a direct conversion of relational data into RDF triples proposed by the W3C Consortium. The suggested extraction process of RDF triples is automatic; however, it remains manual when it comes to primary and foreign keys detection. Simultaneously, we have developed a tool, called HTML2RDF, which accomplishes the extraction process. Results obtained by HTML2RDF were promising. However, their quality remains dependent on the proper determination of primary and foreign keys.","PeriodicalId":156843,"journal":{"name":"2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Extracting Linked Data from HTML Tables\",\"authors\":\"Ahmed Ktob, Zhoujun Li, D. Bouchiha\",\"doi\":\"10.1109/CIC.2017.00018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The web plays a crucial role in our daily life. Its openness allows users to access data around the clock. Recently, data has become more exploitable by machines due to the newly introduced mechanism of linked data, which improves the quality of published data on the web dramatically. Therefore, we have attempted to benefit from the investment, regarding data, which already exist on the web, particularly web applications, to generate linked data. To achieve this, we suggested a set of transformation rules to extract data from HTML tables then convert them into RDF (Resource Description Framework) triples. Our hypothesis is based on a direct conversion of relational data into RDF triples proposed by the W3C Consortium. The suggested extraction process of RDF triples is automatic; however, it remains manual when it comes to primary and foreign keys detection. Simultaneously, we have developed a tool, called HTML2RDF, which accomplishes the extraction process. Results obtained by HTML2RDF were promising. However, their quality remains dependent on the proper determination of primary and foreign keys.\",\"PeriodicalId\":156843,\"journal\":{\"name\":\"2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIC.2017.00018\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIC.2017.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The web plays a crucial role in our daily life. Its openness allows users to access data around the clock. Recently, data has become more exploitable by machines due to the newly introduced mechanism of linked data, which improves the quality of published data on the web dramatically. Therefore, we have attempted to benefit from the investment, regarding data, which already exist on the web, particularly web applications, to generate linked data. To achieve this, we suggested a set of transformation rules to extract data from HTML tables then convert them into RDF (Resource Description Framework) triples. Our hypothesis is based on a direct conversion of relational data into RDF triples proposed by the W3C Consortium. The suggested extraction process of RDF triples is automatic; however, it remains manual when it comes to primary and foreign keys detection. Simultaneously, we have developed a tool, called HTML2RDF, which accomplishes the extraction process. Results obtained by HTML2RDF were promising. However, their quality remains dependent on the proper determination of primary and foreign keys.