利用语义依赖检测异常语义Web数据

2011 IEEE Fifth International Conference on Semantic Computing Pub Date : 2011-09-18 DOI:10.1109/ICSC.2011.81

Yang Yu, Yingjie Li, J. Heflin

{"title":"利用语义依赖检测异常语义Web数据","authors":"Yang Yu, Yingjie Li, J. Heflin","doi":"10.1109/ICSC.2011.81","DOIUrl":null,"url":null,"abstract":"Data quality is a critical problem for the Semantic Web. We propose that the degree to which a triple deviates from similar triples can be an important heuristic for identifying errors. Inspired by data dependency, which has shown promise in database data quality research, we introduce Semantic Dependency to assess quality of Semantic Web data. The system first builds a summary graph for finding candidate semantic dependencies. Each semantic dependency has a probability according to its instantiations and is subsequently adjusted based on the inconsistencies among them. Then triples can get a posterior probability of normality based on what semantic dependencies can support each of them. Repeating the iteration above, the proposed approach detects abnormal Semantic Web data. Experiments have shown that the system is efficient on data set with 10M triples and has more than a ten percent F-score improvement over our previous system.","PeriodicalId":408382,"journal":{"name":"2011 IEEE Fifth International Conference on Semantic Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Detecting Abnormal Semantic Web Data Using Semantic Dependency\",\"authors\":\"Yang Yu, Yingjie Li, J. Heflin\",\"doi\":\"10.1109/ICSC.2011.81\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data quality is a critical problem for the Semantic Web. We propose that the degree to which a triple deviates from similar triples can be an important heuristic for identifying errors. Inspired by data dependency, which has shown promise in database data quality research, we introduce Semantic Dependency to assess quality of Semantic Web data. The system first builds a summary graph for finding candidate semantic dependencies. Each semantic dependency has a probability according to its instantiations and is subsequently adjusted based on the inconsistencies among them. Then triples can get a posterior probability of normality based on what semantic dependencies can support each of them. Repeating the iteration above, the proposed approach detects abnormal Semantic Web data. Experiments have shown that the system is efficient on data set with 10M triples and has more than a ten percent F-score improvement over our previous system.\",\"PeriodicalId\":408382,\"journal\":{\"name\":\"2011 IEEE Fifth International Conference on Semantic Computing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE Fifth International Conference on Semantic Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSC.2011.81\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Fifth International Conference on Semantic Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSC.2011.81","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

数据质量是语义网的一个关键问题。我们提出，一个三元组偏离相似三元组的程度可以作为识别错误的重要启发式。受数据依赖关系(data dependency)的启发，我们引入语义依赖关系(Semantic dependency)来评估语义Web数据的质量。系统首先构建一个摘要图来查找候选语义依赖关系。每个语义依赖根据其实例具有一个概率，随后根据它们之间的不一致性进行调整。然后三元组可以根据语义依赖关系来得到正态性的后验概率。重复上述迭代，提出的方法检测异常语义Web数据。实验表明，该系统在具有10M三元组的数据集上是有效的，并且比我们以前的系统提高了10%以上的F-score。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Detecting Abnormal Semantic Web Data Using Semantic Dependency

Data quality is a critical problem for the Semantic Web. We propose that the degree to which a triple deviates from similar triples can be an important heuristic for identifying errors. Inspired by data dependency, which has shown promise in database data quality research, we introduce Semantic Dependency to assess quality of Semantic Web data. The system first builds a summary graph for finding candidate semantic dependencies. Each semantic dependency has a probability according to its instantiations and is subsequently adjusted based on the inconsistencies among them. Then triples can get a posterior probability of normality based on what semantic dependencies can support each of them. Repeating the iteration above, the proposed approach detects abnormal Semantic Web data. Experiments have shown that the system is efficient on data set with 10M triples and has more than a ten percent F-score improvement over our previous system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 IEEE Fifth International Conference on Semantic Computing

自引率

0.00%

发文量