Information Quality in Information Systems最新文献

筛选
英文 中文
Provider issues in quality-constrained data provisioning 在质量受限的数据供应中提供程序问题
Information Quality in Information Systems Pub Date : 2005-06-17 DOI: 10.1145/1077501.1077507
P. Missier, Suzanne M. Embury
{"title":"Provider issues in quality-constrained data provisioning","authors":"P. Missier, Suzanne M. Embury","doi":"10.1145/1077501.1077507","DOIUrl":"https://doi.org/10.1145/1077501.1077507","url":null,"abstract":"Formal frameworks exist that allow service providers and users to negotiate the quality of a service. While these agreements usually include non-functional service properties, the quality of the information offered by a provider is neglected. Yet, in important application scenarios, notably in those based on the Service-Oriented computing paradigm, the outcome of complex workflows is directly affected by the quality of the data involved. In this paper, we propose a model for formal data quality agreements between data providers and data consumers, and analyze its feasibility by showing how a provider may take data quality constraints into account as part of its data provisioning process. Our analysis of the technical issues involved suggests that this is a complex problem in general, although satisfactory algorithmic and architectural solutions can be found under certain assumptions. To support this claim, we describe an algorithm for dealing with constraints on the completeness of a query result with respect to a reference data source, and outline an initial provider architecture for managing more general data quality constraints.","PeriodicalId":306187,"journal":{"name":"Information Quality in Information Systems","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122870674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An event based framework for improving information quality that integrates baseline models, causal models and formal reference models 用于改进信息质量的基于事件的框架,它集成了基线模型、因果模型和正式参考模型
Information Quality in Information Systems Pub Date : 2005-06-17 DOI: 10.1145/1077501.1077510
Joseph Bugajski, R. Grossman, E. Sumner, Zhao Tang
{"title":"An event based framework for improving information quality that integrates baseline models, causal models and formal reference models","authors":"Joseph Bugajski, R. Grossman, E. Sumner, Zhao Tang","doi":"10.1145/1077501.1077510","DOIUrl":"https://doi.org/10.1145/1077501.1077510","url":null,"abstract":"We introduce a framework for improving information quality in complex distributed systems that integrates: 1) Analytic models that describe baseline values for attributes and combinations of attributes and components that detect statistically significant changes from baselines. These models determine whether a significant change has occurred, and if so, when. 2) Casual models that help determine why a statistically significant change has occurred and what its impact is. These models focus on the reasons for a change. 3) Formal business and technical reference models so that data and information quality problems are less likely to occur in the future. In this note, we focus on the first two types of models and describe how this framework applies to data quality problems associated with electronic payments transactions and highway traffic patterns.","PeriodicalId":306187,"journal":{"name":"Information Quality in Information Systems","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124361096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Effective and scalable solutions for mixed and split citation problems in digital libraries 数字图书馆混合和分割引文问题的有效和可扩展的解决方案
Information Quality in Information Systems Pub Date : 2005-06-17 DOI: 10.1145/1077501.1077514
Dongwon Lee, Byung-Won On, Jaewoo Kang, Sanghyun Park
{"title":"Effective and scalable solutions for mixed and split citation problems in digital libraries","authors":"Dongwon Lee, Byung-Won On, Jaewoo Kang, Sanghyun Park","doi":"10.1145/1077501.1077514","DOIUrl":"https://doi.org/10.1145/1077501.1077514","url":null,"abstract":"In this paper, we consider two important problems that commonly occur in bibliographic digital libraries, which seriously degrade their data qualities: Mixed Citation (MC) problem (i.e., citations of different scholars with their names being homonyms are mixed together) and Split Citation (SC) problem (i.e., citations of the same author appear under different name variants). In particular, we investigate an effective yet scalable solution since citations in such digital libraries tend to be large-scale. After formally defining the problems and accompanying challenges, we present an effective solution that is based on the state-of-the-art sampling-based approximate join algorithm. Our claim is verified through preliminary experimental results.","PeriodicalId":306187,"journal":{"name":"Information Quality in Information Systems","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131839646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 99
A generalized cost optimal decision model for record matching 记录匹配的广义成本最优决策模型
Information Quality in Information Systems Pub Date : 2004-06-18 DOI: 10.1145/1012453.1012457
Vassilios S. Verykios, G. Moustakides
{"title":"A generalized cost optimal decision model for record matching","authors":"Vassilios S. Verykios, G. Moustakides","doi":"10.1145/1012453.1012457","DOIUrl":"https://doi.org/10.1145/1012453.1012457","url":null,"abstract":"Record (or entity) matching or linkage is the process of identifying records in one or more data sources, that refer to the same real world entity or object. In record linkage, the ultimate goal of a decision model is to provide the decision maker with a tool for making decisions upon the actual matching status of a pair of records (i.e., documents, events, persons, cases, etc.). Existing models of record linkage rely on decision rules that minimize the probability of subjecting a case to clerical review, conditional on the probabilities of erroneous matches and erroneous non-matches. In practice though, (a) the value of an erroneous match is, in many applications, quite different from the value of an erroneous non-match, and (b) the cost and the probability of a misclassification, which is associated with the clerical review, is ignored in this way. In this paper, we present a decision model which is optimal, based on the cost of the record linkage operation, and general enough to accommodate multi-class or multi-decision case studies. We also present an example along with the results from applying the proposed model to large comparison spaces.","PeriodicalId":306187,"journal":{"name":"Information Quality in Information Systems","volume":"240 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122539892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A framework for analysis of data freshness 数据新鲜度分析框架
Information Quality in Information Systems Pub Date : 2004-06-18 DOI: 10.1145/1012453.1012464
M. Bouzeghoub, Verónika Peralta
{"title":"A framework for analysis of data freshness","authors":"M. Bouzeghoub, Verónika Peralta","doi":"10.1145/1012453.1012464","DOIUrl":"https://doi.org/10.1145/1012453.1012464","url":null,"abstract":"Data freshness has been identified as one of the most important data quality attributes in information systems. This importance increases particularly in the context of distributed systems, composed of a large set of autonomous data sources, where integrating data having different freshness may lead to semantic problems. There are various definitions of data freshness in the literature, depending on the applications where they are used, as well as different metrics to measure them. This paper presents an analysis of these definitions and metrics and proposes a taxonomy based upon the nature of the data, the type of application and the synchronization policies underlying the multi-source information system. We analyze, in terms of the taxonomy, the way freshness is defined and used in several types of systems and we present some open research problems in the field of data freshness evaluation.","PeriodicalId":306187,"journal":{"name":"Information Quality in Information Systems","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132263186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 185
Detecting duplicate objects in XML documents 检测XML文档中的重复对象
Information Quality in Information Systems Pub Date : 2004-06-18 DOI: 10.1145/1012453.1012456
Melanie Herschel, Felix Naumann
{"title":"Detecting duplicate objects in XML documents","authors":"Melanie Herschel, Felix Naumann","doi":"10.1145/1012453.1012456","DOIUrl":"https://doi.org/10.1145/1012453.1012456","url":null,"abstract":"The problem of detecting duplicate entities that describe the same real-world object (and purging them) is an important data cleansing task, necessary to improve data quality. For data stored in a flat relation, numerous solutions to this problem exist. As XML becomes increasingly popular for data representation, algorithms to detect duplicates in nested XML documents are required.In this paper, we present a domain-independent algorithm that effectively identifies duplicates in an XML document. The solution adopts a top-down traversal of the XML tree structure to identify duplicate elements on each level. Pairs of duplicate elements are detected using a thresholded similarity function, and are then clustered by computing the transitive closure. To minimize the number of pairwise element comparisons, an appropriate filter function is used. The similarity measure involves string similarity for pairs of strings, which is measured using their edit distance. To increase efficiency, we avoid the computation of edit distance for pairs of strings using three filtering methods subsequently. First experiments show that our approach detects XML duplicates accurately and efficiently.","PeriodicalId":306187,"journal":{"name":"Information Quality in Information Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115063588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 71
Utility-based resolution of data inconsistencies 基于实用程序的数据不一致性解决方案
Information Quality in Information Systems Pub Date : 2004-06-18 DOI: 10.1145/1012453.1012460
Amihai Motro, P. Anokhin, A. Acar
{"title":"Utility-based resolution of data inconsistencies","authors":"Amihai Motro, P. Anokhin, A. Acar","doi":"10.1145/1012453.1012460","DOIUrl":"https://doi.org/10.1145/1012453.1012460","url":null,"abstract":"A virtual database system is software that provides unified access to multiple information sources. If the sources are overlapping in their contents and independently maintained, then the likelihood of inconsistent answers is high. Solutions are often based on ranking (which sorts the different answers according to recurrence) and on fusion (which synthesizes a new value from the different alternatives according to a specific formula). In this paper we argue that both methods are flawed, and we offer alternative solutions that are based on knowledge about the performance of the source data; including features such as recentness, availability, accuracy and cost. These features are combined in a flexible utility function that expresses the overall value of a data item to the user. Utility allows us to (1) define meaningful ranking on the inconsistent set of answers, and offer the topranked answer as a preferred answer; (2) determine whether a fusion value is indeed better than the initial values, by calculating its utility and comparing it to the utilities of the initial values; and (3) discover the best fusion: the fusion formula that optimizes the utility. The advantages of such performance-based and utility-driven ranking and fusion are considerable.","PeriodicalId":306187,"journal":{"name":"Information Quality in Information Systems","volume":"38 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131809355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 66
Execution of data mappers 执行数据映射器
Information Quality in Information Systems Pub Date : 2004-06-18 DOI: 10.1145/1012453.1012455
Paulo Carreira, H. Galhardas
{"title":"Execution of data mappers","authors":"Paulo Carreira, H. Galhardas","doi":"10.1145/1012453.1012455","DOIUrl":"https://doi.org/10.1145/1012453.1012455","url":null,"abstract":"Data mappers are essential operators for implementing data transformations supporting schema mapping and integration scenarios such as legacy data migration, ETL processes for data warehousing, data cleaning activities, and business integration initiatives. Despite their widespread use, no formalization of this important operation has been proposed so far. In this paper we propose the data mapper operator as an extension to the relational algebra. We supply a set of algebraic rewriting rules for optimizing queries that combine standard relational operators with data mappers. Finally, we propose algorithms for their efficient physical execution.","PeriodicalId":306187,"journal":{"name":"Information Quality in Information Systems","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117075214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Mining for patterns in contradictory data 挖掘矛盾数据中的模式
Information Quality in Information Systems Pub Date : 2004-06-18 DOI: 10.1145/1012453.1012463
Heiko Müller, U. Leser, J. Freytag
{"title":"Mining for patterns in contradictory data","authors":"Heiko Müller, U. Leser, J. Freytag","doi":"10.1145/1012453.1012463","DOIUrl":"https://doi.org/10.1145/1012453.1012463","url":null,"abstract":"Information integration is often faced with the problem that different data sources represent the same set of the real-world objects, but give conflicting values for specific properties of these objects. Within this paper we present a model of such conflicts and describe an algorithm for efficiently detecting patterns of conflicts in a pair of overlapping data sources. The contradiction patterns we can find are a special kind of association rules, describing regularities in conflicts occurring together with certain attribute values, paris of attribute values, or with other conflicts. Therefore, we adapt existing association rule mining algorithms for mining contradiction patterns. Such patterns are an important tool for human experts that try to find and resolve problems in data quality using domain knowledge. We present the results of applying our method on a real world data set from the life science domain and show how it helps to generate clean data for integrated data warehouses.","PeriodicalId":306187,"journal":{"name":"Information Quality in Information Systems","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127777671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Tackling inconsistencies in data integration through source preferences 通过源首选项处理数据集成中的不一致性
Information Quality in Information Systems Pub Date : 2004-06-18 DOI: 10.1145/1012453.1012459
Giuseppe De Giacomo, D. Lembo, M. Lenzerini, R. Rosati
{"title":"Tackling inconsistencies in data integration through source preferences","authors":"Giuseppe De Giacomo, D. Lembo, M. Lenzerini, R. Rosati","doi":"10.1145/1012453.1012459","DOIUrl":"https://doi.org/10.1145/1012453.1012459","url":null,"abstract":"Dealing with inconsistencies is one the main challenges in data integration systems, where data stored in the local sources may violate integrity constraints specified at the global level. Recently, declarative approaches have been proposed to deal with such a problem. Existing declarative proposals do not take into account preference assertions specified between sources when trying to solve inconsistency. On the other hand, the designer of an integration system may often include in the specification preference rules indicating the quality of data sources. In this paper, we consider Local-As-View integration systems, and propose a method that allows one to assign formal semantics to a data integration system whose declarative specification includes information on source preferences. To the best of our knowledge, our approach is the first one to consider in a declarative way information on source quality for dealing with inconsistent data in Local-As-View integration systems.","PeriodicalId":306187,"journal":{"name":"Information Quality in Information Systems","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125920253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信