Journal of Data and Information Quality (JDIQ)最新文献_第9页

Challenges of Open Data Quality 开放数据质量的挑战

Journal of Data and Information Quality (JDIQ) Pub Date : 2017-09-15 DOI: 10.1145/3110291

D. Corsar, P. Edwards

引用次数: 11

Challenge Paper 挑战的论文

Journal of Data and Information Quality (JDIQ) Pub Date : 2017-09-15 DOI: 10.1145/3106236

P. Arbuckle, E. Kahn, Adam Kriesberg

{"title":"Challenge Paper","authors":"P. Arbuckle, E. Kahn, Adam Kriesberg","doi":"10.1145/3106236","DOIUrl":"https://doi.org/10.1145/3106236","url":null,"abstract":"Life Cycle Assessment is a modeling approach to assess the environmental aspects and potential environmental impacts (e.g., use of resources and the environmental consequences of releases) throughout a product’s life cycle from raw material acquisition through production, use, end-oflife treatment, recycling and final disposal (i.e., cradle-to-grave) (ISO 14040). It has been employed in recent years by industry and governments to address growing interest about the true costs of resource use, environmental impact, and other externalities of economic activity. Inherently multidisciplinary, LCA draws and synthesizes information from the social and physical sciences. This breadth within LCA models (often referred to as “data” by the community of practitioners) can make collecting and synthesizing information the most expensive component of an analysis and drives the need for model reuse. However, the LCA community is faced with a major challenge in its capacity to produce sufficient documentation and metadata to determine representation of these models and to reuse them correctly, an issue broadly affecting researchers across disciplines. Tenopir et al. (2011, 2015) found in each of two surveys of scientific data management and sharing practices that researchers do not feel equipped to generate metadata to facilitate reuse of their data. Furthermore, some researchers reported limited knowledge of available standards to describe data. The challenge in capacity in the LCA community is driven by two factors: the nascent state of standardization in LCA modeling and the strong focus on research and results for funded LCA work. Standardization serves to create a foundational set of rules and guidelines to support","PeriodicalId":15582,"journal":{"name":"Journal of Data and Information Quality (JDIQ)","volume":"15 1","pages":"1 - 4"},"PeriodicalIF":0.0,"publicationDate":"2017-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78750286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

An Exploratory Case Study to Understand Primary Care Users and Their Data Quality Tradeoffs 了解初级保健用户及其数据质量权衡的探索性案例研究

Journal of Data and Information Quality (JDIQ) Pub Date : 2017-07-10 DOI: 10.1145/3058750

J. St-Maurice, C. Burns

{"title":"An Exploratory Case Study to Understand Primary Care Users and Their Data Quality Tradeoffs","authors":"J. St-Maurice, C. Burns","doi":"10.1145/3058750","DOIUrl":"https://doi.org/10.1145/3058750","url":null,"abstract":"Primary care data is an important part of the evolving healthcare ecosystem. Generally, users in primary care are expected to provide excellent patient care and record high-quality data. In practice, users must balance sets of priorities regarding care and data. The goal of this study was to understand data quality tradeoffs between timeliness, validity, completeness, and use among primary care users. As a case study, data quality measures and metrics are developed through a focus group session with managers. After calculating and extracting measurements of data quality from six years of historic data, each measure was modeled with logit binomial regression to show correlations, characterize tradeoffs, and investigate data quality interactions. Measures and correlations for completeness, use, and timeliness were calculated for 196,967 patient encounters. Based on the analysis, there was a positive relationship between validity and completeness, and a negative relationship between timeliness and use. Use of data and reductions in entry delay were positively associated with completeness and validity. Our results suggest that if users are not provided with sufficient time to record data as part of their regular workflow, they will prioritize spending available time with patients. As a measurement of a primary care system's effectiveness, the negative correlation between use and timeliness points to a self-reinforcing relationship that provides users with little external value. In the future, additional data can be generated from comparable organizations to test several new hypotheses about primary care users.","PeriodicalId":15582,"journal":{"name":"Journal of Data and Information Quality (JDIQ)","volume":"34 1","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2017-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79830650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

The Challenge of Quality in Social Computation 社会计算质量的挑战

Journal of Data and Information Quality (JDIQ) Pub Date : 2017-06-30 DOI: 10.1145/3041762

Milan Markovic, P. Edwards

{"title":"The Challenge of Quality in Social Computation","authors":"Milan Markovic, P. Edwards","doi":"10.1145/3041762","DOIUrl":"https://doi.org/10.1145/3041762","url":null,"abstract":"Interactive web technologies now enable a host of so-called social computations, which can address challenges that are beyond the capabilities of machines alone. Notable examples of such social computation systems include Galaxy Zoo,1 BeeWatch,2 and Ushahidi,3 operating in fields as diverse as classification of newly discovered galaxies, monitoring of bee populations, and disaster management. A system for earthquake prediction using social media [Sakaki et al. 2010] illustrates how such computations can also emerge on social networking platforms. Social computations can be modeled as a complex collection of structured activities (i.e. workflows) that represent a blend of human and machine tasks, with associated objectives and reward mechanisms. In our previous work [Markovic et al. 2013; Markovic 2016] we argued that recording provenance of social computation workflows would enhance decision-making support for all associated stakeholders; these include initiators, participants, and beneficiaries of such computations. In the next section, we will briefly introduce the key characteristics of complex social computation systems before discussing why quality assessments in such a context are challenging.","PeriodicalId":15582,"journal":{"name":"Journal of Data and Information Quality (JDIQ)","volume":"20 1","pages":"1 - 3"},"PeriodicalIF":0.0,"publicationDate":"2017-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87712900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

The Data Repurposing Challenge 数据再利用挑战

Journal of Data and Information Quality (JDIQ) Pub Date : 2017-06-30 DOI: 10.1145/3022698

Philip Woodall

引用次数: 5

Experience 经验

Journal of Data and Information Quality (JDIQ) Pub Date : 2017-06-30 DOI: 10.1145/3092700

Leena Al-Hussaini

引用次数: 5

Dependable Data Repairing with Fixing Rules 可靠的数据修复与固定规则

Journal of Data and Information Quality (JDIQ) Pub Date : 2017-06-30 DOI: 10.1145/3041761

Jiannan Wang, N. Tang

{"title":"Dependable Data Repairing with Fixing Rules","authors":"Jiannan Wang, N. Tang","doi":"10.1145/3041761","DOIUrl":"https://doi.org/10.1145/3041761","url":null,"abstract":"One of the main challenges that data-cleaning systems face is to automatically identify and repair data errors in a dependable manner. Though data dependencies (also known as integrity constraints) have been widely studied to capture errors in data, automated and dependable data repairing on these errors has remained a notoriously difficult problem. In this work, we introduce an automated approach for dependably repairing data errors, based on a novel class of fixing rules. A fixing rule contains an evidence pattern, a set of negative patterns, and a fact value. The heart of fixing rules is deterministic: given a tuple, the evidence pattern and the negative patterns of a fixing rule are combined to precisely capture which attribute is wrong, and the fact indicates how to correct this error. We study several fundamental problems associated with fixing rules and establish their complexity. We develop efficient algorithms to check whether a set of fixing rules are consistent and discuss approaches to resolve inconsistent fixing rules. We also devise efficient algorithms for repairing data errors using fixing rules. Moreover, we discuss approaches on how to generate a large number of fixing rules from examples or available knowledge bases. We experimentally demonstrate that our techniques outperform other automated algorithms in terms of the accuracy of repairing data errors, using both real-life and synthetic data.","PeriodicalId":15582,"journal":{"name":"Journal of Data and Information Quality (JDIQ)","volume":"27 1","pages":"1 - 34"},"PeriodicalIF":0.0,"publicationDate":"2017-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74181624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

QDflows

Journal of Data and Information Quality (JDIQ) Pub Date : 2017-06-30 DOI: 10.1145/3064173

Sabrina Abdellaoui, Fahima Nader, R. Chalal

{"title":"QDflows","authors":"Sabrina Abdellaoui, Fahima Nader, R. Chalal","doi":"10.1145/3064173","DOIUrl":"https://doi.org/10.1145/3064173","url":null,"abstract":"In the big data era, data integration is becoming increasingly important. It is usually handled by data flows processes that extract, transform, and clean data from several sources, and populate the data integration system (DIS). Designing data flows is facing several challenges. In this article, we deal with data quality issues such as (1) specifying a set of quality rules, (2) enforcing them on the data flow pipeline to detect violations, and (3) producing accurate repairs for the detected violations. We propose QDflows, a system for designing quality-aware data flows that considers the following as input: (1) a high-quality knowledge base (KB) as the global schema of integration, (2) a set of data sources and a set of validated users’ requirements, (3) a set of defined mappings between data sources and the KB, and (4) a set of quality rules specified by users. QDflows uses an ontology to design the DIS schema. It offers the ability to define the DIS ontology as a module of the knowledge base, based on validated users’ requirements. The DIS ontology model is then extended with multiple types of quality rules specified by users. QDflows extracts and transforms data from sources to populate the DIS. It detects violations of quality rules enforced on the data flows, constructs repair patterns, searches for horizontal and vertical matches in the knowledge base, and performs an automatic repair when possible or generates possible repairs. It interactively involves users to validate the repair process before loading the clean data into the DIS. Using real-life and synthetic datasets, the DBpedia and Yago knowledge bases, we experimentally evaluate the generality, effectiveness, and efficiency of QDflows. We also showcase an interactive tool implementing our system.","PeriodicalId":15582,"journal":{"name":"Journal of Data and Information Quality (JDIQ)","volume":"10 1","pages":"1 - 39"},"PeriodicalIF":0.0,"publicationDate":"2017-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80759605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Ontological Multidimensional Data Models and Contextual Data Quality 本体多维数据模型与上下文数据质量

Journal of Data and Information Quality (JDIQ) Pub Date : 2017-04-01 DOI: 10.1145/3148239

L. Bertossi, Mostafa Milani

{"title":"Ontological Multidimensional Data Models and Contextual Data Quality","authors":"L. Bertossi, Mostafa Milani","doi":"10.1145/3148239","DOIUrl":"https://doi.org/10.1145/3148239","url":null,"abstract":"Data quality assessment and data cleaning are context-dependent activities. Motivated by this observation, we propose the Ontological Multidimensional Data Model (OMD model), which can be used to model and represent contexts as logic-based ontologies. The data under assessment are mapped into the context for additional analysis, processing, and quality data extraction. The resulting contexts allow for the representation of dimensions, and multidimensional data quality assessment becomes possible. At the core of a multidimensional context, we include a generalized multidimensional data model and a Datalog± ontology with provably good properties in terms of query answering. These main components are used to represent dimension hierarchies, dimensional constraints, and dimensional rules and define predicates for quality data specification. Query answering relies on and triggers navigation through dimension hierarchies and becomes the basic tool for the extraction of quality data. The OMD model is interesting per se beyond applications to data quality. It allows for a logic-based and computationally tractable representation of multidimensional data, extending previous multidimensional data models with additional expressive power and functionalities.","PeriodicalId":15582,"journal":{"name":"Journal of Data and Information Quality (JDIQ)","volume":"32 1","pages":"1 - 36"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81010519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

A Probabilistically Integrated System for Crowd-Assisted Text Labeling and Extraction 群体辅助文本标注与提取的概率集成系统

Journal of Data and Information Quality (JDIQ) Pub Date : 2017-02-09 DOI: 10.1145/3012003

S. Goldberg, D. Wang, Christan Earl Grant

引用次数: 12