Journal of Data and Information Quality (JDIQ)最新文献_第8页

Scalable Methods for Measuring the Connectivity and Quality of Large Numbers of Linked Datasets 测量大量关联数据集连通性和质量的可扩展方法

Journal of Data and Information Quality (JDIQ) Pub Date : 2018-01-27 DOI: 10.1145/3165713

M. Mountantonakis, Yannis Tzitzikas

{"title":"Scalable Methods for Measuring the Connectivity and Quality of Large Numbers of Linked Datasets","authors":"M. Mountantonakis, Yannis Tzitzikas","doi":"10.1145/3165713","DOIUrl":"https://doi.org/10.1145/3165713","url":null,"abstract":"Although the ultimate objective of Linked Data is linking and integration, it is not currently evident how connected the current Linked Open Data (LOD) cloud is. In this article, we focus on methods, supported by special indexes and algorithms, for performing measurements related to the connectivity of more than two datasets that are useful in various tasks including (a) Dataset Discovery and Selection; (b) Object Coreference, i.e., for obtaining complete information about a set of entities, including provenance information; (c) Data Quality Assessment and Improvement, i.e., for assessing the connectivity between any set of datasets and monitoring their evolution over time, as well as for estimating data veracity; (d) Dataset Visualizations; and various other tasks. Since it would be prohibitively expensive to perform all these measurements in a naïve way, in this article, we introduce indexes (and their construction algorithms) that can speed up such tasks. In brief, we introduce (i) a namespace-based prefix index, (ii) a sameAs catalog for computing the symmetric and transitive closure of the owl:sameAs relationships encountered in the datasets, (iii) a semantics-aware element index (that exploits the aforementioned indexes), and, finally, (iv) two lattice-based incremental algorithms for speeding up the computation of the intersection of URIs of any set of datasets. For enhancing scalability, we propose parallel index construction algorithms and parallel lattice-based incremental algorithms, we evaluate the achieved speedup using either a single machine or a cluster of machines, and we provide insights regarding the factors that affect efficiency. Finally, we report measurements about the connectivity of the (billion triples-sized) LOD cloud that have never been carried out so far.","PeriodicalId":15582,"journal":{"name":"Journal of Data and Information Quality (JDIQ)","volume":"124 1","pages":"1 - 49"},"PeriodicalIF":0.0,"publicationDate":"2018-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78178169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Requirements for Data Quality Metrics 数据质量指标要求

Journal of Data and Information Quality (JDIQ) Pub Date : 2018-01-22 DOI: 10.1145/3148238

Bernd Heinrich, Diana Hristova, Mathias Klier, Alexander Schiller, Michael Szubartowicz

引用次数: 76

Experience 经验

Journal of Data and Information Quality (JDIQ) Pub Date : 2018-01-15 DOI: 10.1145/3148240

Kyu Han Koh, Eric Fouh, Mohammed F. Farghally, Hossameldin Shahin, C. Shaffer

{"title":"Experience","authors":"Kyu Han Koh, Eric Fouh, Mohammed F. Farghally, Hossameldin Shahin, C. Shaffer","doi":"10.1145/3148240","DOIUrl":"https://doi.org/10.1145/3148240","url":null,"abstract":"We present lessons learned related to data collection and analysis from 5 years of experience with the eTextbook system OpenDSA. The use of such cyberlearning systems is expanding rapidly in both formal and informal educational settings. Although the precise issues related to any such project are idiosyncratic based on the data collection technology and goals of the project, certain types of data collection problems will be common. We begin by describing the nature of the data transmitted between the student’s client machine and the database server, and our initial database schema for storing interaction log data. We describe many problems that we encountered, with the nature of the problems categorized as syntactic-level data collection issues, issues with relating events to users, or issues with tracking users over time. Relating events to users and tracking the time spent on tasks are both prerequisites to converting syntactic-level interaction streams to semantic-level behavior needed for higher-order analysis of the data. Finally, we describe changes made to our database schema that helped to resolve many of the issues that we had encountered. These changes help advance our ultimate goal of encouraging a change from ineffective learning behavior by students to more productive behavior.","PeriodicalId":15582,"journal":{"name":"Journal of Data and Information Quality (JDIQ)","volume":"25 1","pages":"1 - 10"},"PeriodicalIF":0.0,"publicationDate":"2018-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78228987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Challenges in Enabling Quality of Analytics in the Cloud 在云中实现分析质量的挑战

Journal of Data and Information Quality (JDIQ) Pub Date : 2018-01-15 DOI: 10.1145/3138806

Hong Linh Truong, A. Murguzur, Erica Y. Yang

引用次数: 2

Validating Data Quality Actions in Scoring Processes 在评分过程中验证数据质量行为

Journal of Data and Information Quality (JDIQ) Pub Date : 2018-01-15 DOI: 10.1145/3141248

C. Cappiello, C. Cerletti, C. Fratto, B. Pernici

{"title":"Validating Data Quality Actions in Scoring Processes","authors":"C. Cappiello, C. Cerletti, C. Fratto, B. Pernici","doi":"10.1145/3141248","DOIUrl":"https://doi.org/10.1145/3141248","url":null,"abstract":"Data quality has gained momentum among organizations upon the realization that poor data quality might cause failures and/or inefficiencies, thus compromising business processes and application results. However, enterprises often adopt data quality assessment and improvement methods based on practical and empirical approaches without conducting a rigorous analysis of the data quality issues and outcome of the enacted data quality improvement practices. In particular, data quality management, especially the identification of the data quality dimensions to be monitored and improved, is performed by knowledge workers on the basis of their skills and experience. Control methods are therefore designed on the basis of expected and evident quality problems; thus, these methods may not be effective in dealing with unknown and/or unexpected problems. This article aims to provide a methodology, based on fault injection, for validating the data quality actions used by organizations. We show how it is possible to check whether the adopted techniques properly monitor the real issues that may damage business processes. At this stage, we focus on scoring processes, i.e., those in which the output represents the evaluation or ranking of a specific object. We show the effectiveness of our proposal by means of a case study in the financial risk management area.","PeriodicalId":15582,"journal":{"name":"Journal of Data and Information Quality (JDIQ)","volume":"48 1","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2018-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88882387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Editor-in-Chief (January 2014-May 2017) Farewell Report 主编(2014年1月- 2017年5月)《告别报告

Journal of Data and Information Quality (JDIQ) Pub Date : 2017-10-25 DOI: 10.1145/3143313

L. Raschid

引用次数: 0

Foreword from the New JDIQ Editor-in-Chief 新JDIQ总编辑的前言

Journal of Data and Information Quality (JDIQ) Pub Date : 2017-10-25 DOI: 10.1145/3143316

T. Catarci

引用次数: 0

Information Quality Challenges in Shared Healthcare Decision Making 共享医疗保健决策中的信息质量挑战

Journal of Data and Information Quality (JDIQ) Pub Date : 2017-09-28 DOI: 10.1145/3090056

Min Chen, R. Lukyanenko, M. Tremblay

引用次数: 2

Data Quality Challenges in Social Spam Research 社交垃圾邮件研究中的数据质量挑战

Journal of Data and Information Quality (JDIQ) Pub Date : 2017-09-28 DOI: 10.1145/3090057

Nour El-Mawass, Saad S. Alaboodi

引用次数: 5

Cluster-Based Quality-Aware Adaptive Data Compression for Streaming Data 基于集群的流数据质量感知自适应压缩

Journal of Data and Information Quality (JDIQ) Pub Date : 2017-09-21 DOI: 10.1145/3122863

Aseel Basheer, Kewei Sha

{"title":"Cluster-Based Quality-Aware Adaptive Data Compression for Streaming Data","authors":"Aseel Basheer, Kewei Sha","doi":"10.1145/3122863","DOIUrl":"https://doi.org/10.1145/3122863","url":null,"abstract":"Wireless sensor networks (WSNs) are widely applied in data collection applications. Energy efficiency is one of the most important design goals of WSNs. In this article, we examine the tradeoffs between the energy efficiency and the data quality. First, four attributes used to evaluate data quality are formally defined. Then, we propose a novel data compression algorithm, Quality-Aware Adaptive data Compression (QAAC), to reduce the amount of data communication to save energy. QAAC utilizes an adaptive clustering algorithm to build clusters from dataset; then a code for each cluster is generated and stored in a Huffman encoding tree. The encoding algorithm encodes the original dataset based on the Haffman encoding tree. An improvement algorithm is also designed to reduce the information loss when data are compressed. After the encoded data, the Huffman encoding tree and parameters used in the improvement algorithm have been received at the sink, a decompression algorithm is used to retrieve the approximation of the original dataset. The performance evaluation shows that QAAC is efficient and achieves a much higher compression ratio than lossy and lossless compression algorithms, while it has much smaller information loss than lossy compression algorithms.","PeriodicalId":15582,"journal":{"name":"Journal of Data and Information Quality (JDIQ)","volume":"70 1","pages":"1 - 33"},"PeriodicalIF":0.0,"publicationDate":"2017-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80940529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12