Joana G. Malaverri, André Santanchè, C. B. Medeiros
{"title":"A provenance-based approach to evaluate data quality in eScience","authors":"Joana G. Malaverri, André Santanchè, C. B. Medeiros","doi":"10.1504/IJMSO.2014.059127","DOIUrl":null,"url":null,"abstract":"Data quality is growing in relevance as a research topic. Quality assessment has been progressively incorporated in many business environments, and in software engineering practices. eScience environments, however, because of the multiplicity and heterogeneity of data sources and scientific experts involved in a given problem, complicate data quality assessment. This paper deals with the evaluation of the quality of data managed by eScience applications. Our approach is based on data provenance, i.e. the history of the origins and transformations applied to a given data product. Our contributions include a the specification of a framework to track data provenance and use it to derive quality information, b a model for data provenance based on the Open Provenance Model, and c a methodology to evaluate the quality of data based on its provenance. Our proposal is validated experimentally by a prototype that takes advantage of the Taverna workflow system.","PeriodicalId":111629,"journal":{"name":"Int. J. Metadata Semant. Ontologies","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Metadata Semant. Ontologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJMSO.2014.059127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
Data quality is growing in relevance as a research topic. Quality assessment has been progressively incorporated in many business environments, and in software engineering practices. eScience environments, however, because of the multiplicity and heterogeneity of data sources and scientific experts involved in a given problem, complicate data quality assessment. This paper deals with the evaluation of the quality of data managed by eScience applications. Our approach is based on data provenance, i.e. the history of the origins and transformations applied to a given data product. Our contributions include a the specification of a framework to track data provenance and use it to derive quality information, b a model for data provenance based on the Open Provenance Model, and c a methodology to evaluate the quality of data based on its provenance. Our proposal is validated experimentally by a prototype that takes advantage of the Taverna workflow system.