{"title":"Improving data quality using techniques from human computation","authors":"Vikram Kumar Kirpalani, Saif-ur-Rahman Saif-ur-Rahman","doi":"10.31645/2014.12.1.10","DOIUrl":null,"url":null,"abstract":"The DBpedia is an open data repository extracted from a crowd sourced Knowledge base Wikipedia, because of which the information available there is more vulnerable to inconsistency, grammatical errors, structures and, data type problems. These are just a few issues that existing data is prone to. In this research, our prime focus would be on Data type problems, particularly, the problem of one attribute containing multiple facts. Proposed is the approach in which the issue of Inconsistent behavior of the desired output is addressed and improved, on the idea based on concept hierarchy i.e. association of Parent - Child relationship by employing Human computation and the confidence, and trust of the output has been calculated, out of which the Hierarchies of the entities could be maintained in the form of Triples which could be used to mapped on other data to led the data out of the problem of Implicit type of relationship between attributes.","PeriodicalId":412730,"journal":{"name":"Journal of Independent Studies and Research Computing","volume":"254 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Independent Studies and Research Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31645/2014.12.1.10","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The DBpedia is an open data repository extracted from a crowd sourced Knowledge base Wikipedia, because of which the information available there is more vulnerable to inconsistency, grammatical errors, structures and, data type problems. These are just a few issues that existing data is prone to. In this research, our prime focus would be on Data type problems, particularly, the problem of one attribute containing multiple facts. Proposed is the approach in which the issue of Inconsistent behavior of the desired output is addressed and improved, on the idea based on concept hierarchy i.e. association of Parent - Child relationship by employing Human computation and the confidence, and trust of the output has been calculated, out of which the Hierarchies of the entities could be maintained in the form of Triples which could be used to mapped on other data to led the data out of the problem of Implicit type of relationship between attributes.