ACM Journal of Data and Information Quality (JDIQ)最新文献

筛选
英文 中文
Editorial: Special Issue on Deep Learning for Data Quality 社论:关于数据质量的深度学习专题
ACM Journal of Data and Information Quality (JDIQ) Pub Date : 2022-06-24 DOI: 10.1145/3513135
Donatello Santoro, Saravanan Thirumuruganathan, Paolo Papotti
{"title":"Editorial: Special Issue on Deep Learning for Data Quality","authors":"Donatello Santoro, Saravanan Thirumuruganathan, Paolo Papotti","doi":"10.1145/3513135","DOIUrl":"https://doi.org/10.1145/3513135","url":null,"abstract":"This editorial summarizes the content of the Special Issue on Deep Learning for Data Quality of the Journal of Data and Information Quality (JDIQ).","PeriodicalId":299504,"journal":{"name":"ACM Journal of Data and Information Quality (JDIQ)","volume":"22 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123616804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI Compliance – Challenges of Bridging Data Science and Law 人工智能合规性-连接数据科学和法律的挑战
ACM Journal of Data and Information Quality (JDIQ) Pub Date : 2022-05-05 DOI: 10.1145/3531532
P. Hacker, Felix Naumann, Tobias Friedrich, Stefan Grundmann, Anja Lehmann, Herbert Zech
{"title":"AI Compliance – Challenges of Bridging Data Science and Law","authors":"P. Hacker, Felix Naumann, Tobias Friedrich, Stefan Grundmann, Anja Lehmann, Herbert Zech","doi":"10.1145/3531532","DOIUrl":"https://doi.org/10.1145/3531532","url":null,"abstract":"This vision article outlines the main building blocks of what we term AI Compliance, an effort to bridge two complementary research areas: computer science and the law. Such research has the goal to model, measure, and affect the quality of AI artifacts, such as data, models, and applications, to then facilitate adherence to legal standards.","PeriodicalId":299504,"journal":{"name":"ACM Journal of Data and Information Quality (JDIQ)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124834963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Design and Implementation of a Historical German Firm-level Financial Database 德国历史企业财务数据库的设计与实现
ACM Journal of Data and Information Quality (JDIQ) Pub Date : 2022-04-30 DOI: 10.1145/3531533
Dennis Gram, Pantelis Karapanagiotis, Marius Liebald, U. Walz
{"title":"Design and Implementation of a Historical German Firm-level Financial Database","authors":"Dennis Gram, Pantelis Karapanagiotis, Marius Liebald, U. Walz","doi":"10.1145/3531533","DOIUrl":"https://doi.org/10.1145/3531533","url":null,"abstract":"Broad, long-term financial, and economic datasets are scarce resources, particularly in the European context. In this article, we present an approach for an extensible data model that is adaptable to future changes in technologies and sources. This model may constitute a basis for digitized and structured long-term historical datasets for different jurisdictions and periods. The data model covers the specific peculiarities of historical financial and economic data and is flexible enough to reach out for data of different types (quantitative as well as qualitative) from different historical sources, hence, achieving extensibility. Furthermore, we outline a relational implementation of this approach based on historical German firm and stock market data from 1920 to 1932.","PeriodicalId":299504,"journal":{"name":"ACM Journal of Data and Information Quality (JDIQ)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133105807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contextual Data Cleaning with Ontology Functional Dependencies 基于本体功能依赖的上下文数据清理
ACM Journal of Data and Information Quality (JDIQ) Pub Date : 2022-04-21 DOI: 10.1145/3524303
Zheng Zheng, Longtao Zheng, Morteza Alipourlangouri, Fei Chiang, Lukasz Golab, Jaroslaw Szlichta, S. Baskaran
{"title":"Contextual Data Cleaning with Ontology Functional Dependencies","authors":"Zheng Zheng, Longtao Zheng, Morteza Alipourlangouri, Fei Chiang, Lukasz Golab, Jaroslaw Szlichta, S. Baskaran","doi":"10.1145/3524303","DOIUrl":"https://doi.org/10.1145/3524303","url":null,"abstract":"Functional Dependencies define attribute relationships based on syntactic equality, and when used in data cleaning, they erroneously label syntactically different but semantically equivalent values as errors. We explore dependency-based data cleaning with Ontology Functional Dependencies (OFDs), which express semantic attribute relationships such as synonyms defined by an ontology. We study the theoretical foundations of OFDs, including sound and complete axioms and a linear-time inference procedure. We then propose an algorithm for discovering OFDs (exact ones and ones that hold with some exceptions) from data that uses the axioms to prune the search space. Toward enabling OFDs as data quality rules in practice, we study the problem of finding minimal repairs to a relation and ontology with respect to a set of OFDs. We demonstrate the effectiveness of our techniques on real datasets and show that OFDs can significantly reduce the number of false positive errors in data cleaning techniques that rely on traditional Functional Dependencies.","PeriodicalId":299504,"journal":{"name":"ACM Journal of Data and Information Quality (JDIQ)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126000907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Negative Insurance Claim Generation Using Distance Pooling on Positive Diagnosis-Procedure Bipartite Graphs 正诊断过程二部图上使用距离池的负保险索赔生成
ACM Journal of Data and Information Quality (JDIQ) Pub Date : 2022-04-18 DOI: 10.1145/3531347
Md Enamul Haque, M. E. Tozal
{"title":"Negative Insurance Claim Generation Using Distance Pooling on Positive Diagnosis-Procedure Bipartite Graphs","authors":"Md Enamul Haque, M. E. Tozal","doi":"10.1145/3531347","DOIUrl":"https://doi.org/10.1145/3531347","url":null,"abstract":"Negative samples in health and medical insurance domain refer to fraudulent or erroneous insurance claims that may include inconsistent diagnosis-procedure relations with respect to a medical coding system. Unfortunately, only a few datasets are publicly available for research in health insurance domain, yet none reports any negative claims. However, negative claims are essential not only to develop new machine learning approaches but also to test and validate automated artificial intelligence systems deployed by insurance providers. In this study, we introduce a synthetic negative claim generation procedure based on the bipartite graph representations of positive claims. Our empirical results demonstrate promising outcomes that will improve the development and evaluation processes of machine learning approaches in healthcare, where negative samples are required, but not available. Moreover, the proposed scheme can be applied to other domains, where bipartite graph representations are meaningful and negative samples are lacking.","PeriodicalId":299504,"journal":{"name":"ACM Journal of Data and Information Quality (JDIQ)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121056904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Cluster-then-label Approach for Few-shot Learning with Application to Automatic Image Data Labeling 基于聚类-标记的少镜头学习方法及其在图像数据自动标注中的应用
ACM Journal of Data and Information Quality (JDIQ) Pub Date : 2022-03-04 DOI: 10.1145/3491232
Renzhi Wu, Nilaksh Das, Sanya Chaba, Sakshi Gandhi, Duen Horng Chau, Xu Chu
{"title":"A Cluster-then-label Approach for Few-shot Learning with Application to Automatic Image Data Labeling","authors":"Renzhi Wu, Nilaksh Das, Sanya Chaba, Sakshi Gandhi, Duen Horng Chau, Xu Chu","doi":"10.1145/3491232","DOIUrl":"https://doi.org/10.1145/3491232","url":null,"abstract":"Few-shot learning (FSL) aims at learning to generalize from only a small number of labeled examples for a given target task. Most current state-of-the-art FSL methods typically have two limitations. First, they usually require access to a source dataset (in a similar domain) with abundant labeled examples, which may not always be possible due to privacy concerns and copyright issues. Second, they typically do not offer any estimation of the generalization error on the target FSL task, because the handful of labeled examples must be used for training and cannot spare a validation subset. In this article, we propose a cluster-then-label approach to perform few-shot learning. Our approach does not require access to the labeled source dataset and provides an estimation of generalization error. We show empirically, on four benchmark datasets, that our approach provides competitive predictive performance to state-of-the-art FSL approaches and our generalization error estimation is accurate. Finally, we explore the application of our proposed method to automatic image data labeling. We compare our method with existing automatic data labeling systems. The end-to-end performance of our method outperforms the state-of-the-art automatic data labeling system Snuba by 26% and is only 7% away from the fully supervised upper bound.","PeriodicalId":299504,"journal":{"name":"ACM Journal of Data and Information Quality (JDIQ)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129567449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Machine Learning and Data Cleaning: Which Serves the Other? 机器学习和数据清理:谁为谁服务?
ACM Journal of Data and Information Quality (JDIQ) Pub Date : 2022-03-04 DOI: 10.1145/3506712
I. Ilyas, Theodoros Rekatsinas
{"title":"Machine Learning and Data Cleaning: Which Serves the Other?","authors":"I. Ilyas, Theodoros Rekatsinas","doi":"10.1145/3506712","DOIUrl":"https://doi.org/10.1145/3506712","url":null,"abstract":"The last few years witnessed significant advances in building automated or semi-automated data quality, data cleaning and data integration systems powered by machine learning (ML). In parallel, large deployment of ML systems in business, science, environment and various other areas started to realize the strong dependency on the quality of the input data to these ML models to get reliable predictions or insights. That dual relationship between ML and data cleaning has been addressed by many recent research works under terms such as “Data cleaning for ML” and “ML for automating data cleaning and data preparation”. In this article, we highlight this symbiotic relationship between ML and data cleaning and discuss few challenges that require collaborative efforts of multiple research communities.","PeriodicalId":299504,"journal":{"name":"ACM Journal of Data and Information Quality (JDIQ)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127519190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Which Conference Is That? A Case Study in Computer Science 那是哪个会议?计算机科学案例研究
ACM Journal of Data and Information Quality (JDIQ) Pub Date : 2022-03-04 DOI: 10.1145/3519031
C. Demetrescu, Irene Finocchi, Andrea Ribichini, M. Schaerf
{"title":"Which Conference Is That? A Case Study in Computer Science","authors":"C. Demetrescu, Irene Finocchi, Andrea Ribichini, M. Schaerf","doi":"10.1145/3519031","DOIUrl":"https://doi.org/10.1145/3519031","url":null,"abstract":"Conferences play a major role in some disciplines such as computer science and are often used in research quality evaluation exercises. Differently from journals and books, for which ISSN and ISBN codes provide unambiguous keys, recognizing the conference series in which a paper was published is a rather complex endeavor: There is no unique code assigned to conferences, and the way their names are written may greatly vary across years and catalogs. In this article, we propose a technique for the entity resolution of conferences based on the analysis of different semantic parts of their names. We present the results of an investigation of our technique on a dataset of 42,395 distinct computer science conference names excerpted from the DBLP computer science repository,1 which we automatically link to different authority files. With suitable data cleaning, the precision of our record linkage algorithm can be as high as 94%. A comparison with results obtainable using state-of-the-art general-purpose record linkage algorithms rounds off the article, showing that our ad hoc solution largely outperforms them in terms of the quality of the results.","PeriodicalId":299504,"journal":{"name":"ACM Journal of Data and Information Quality (JDIQ)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131403633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Web Information Extraction Framework with Adaptive and Failure Prediction Feature 一种具有自适应和故障预测特征的Web信息提取框架
ACM Journal of Data and Information Quality (JDIQ) Pub Date : 2022-02-18 DOI: 10.1145/3495008
Sudhir Kumar Patnaik, C. Babu
{"title":"A Web Information Extraction Framework with Adaptive and Failure Prediction Feature","authors":"Sudhir Kumar Patnaik, C. Babu","doi":"10.1145/3495008","DOIUrl":"https://doi.org/10.1145/3495008","url":null,"abstract":"The amount of information available on the internet today requires effective information extraction and processing to offer hyper-personalized user experiences. Inability to extract information by using traditional and machine learning techniques due to dynamic changes in website layout pose significant challenges to the technical community to keep up with such changes. The focus of existing machine learning-based information extraction framework is only on information extraction by using core extraction logic that is susceptible to website changes, thus missing out core features such as ability to handle proactive failure prediction and intelligent information extraction capabilities. The aim of this article is to build a robust and intelligent information extraction framework with the ability not only to proactively predict website failure but also automatically extract information using deep-learning techniques using You Only Look Once and Long Short-term Memory (LSTM) networks. The proactive detection using LSTM detects new location of the web page due to layout changes and enables automatic extraction of information of the new web page. A real-world case with retail website for intelligent information extraction and an offline experimentation environment is setup to demonstrate proactive failure prediction and automatic extraction resulting in high failure prediction, precision and recall of object detection and information extraction.","PeriodicalId":299504,"journal":{"name":"ACM Journal of Data and Information Quality (JDIQ)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133574605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial: Special Issue on Data Transparency—Uses Cases and Applications 社论:数据透明度专题——用例和应用
ACM Journal of Data and Information Quality (JDIQ) Pub Date : 2022-02-11 DOI: 10.1145/3494455
M. Barhamgi, E. Bertino
{"title":"Editorial: Special Issue on Data Transparency—Uses Cases and Applications","authors":"M. Barhamgi, E. Bertino","doi":"10.1145/3494455","DOIUrl":"https://doi.org/10.1145/3494455","url":null,"abstract":"Advances in Artificial Intelligence (AI) and mobile and Internet technologies have been progressively reshaping our lives over the past few years. The applications of the Internet of Things and cyber-physical systems today touch almost all aspects of our daily lives, including healthcare (e.g., remote patient monitoring environments), leisure (e.g., smart entertainment spaces), and work (e.g., smart manufacturing and asset management). For many of us, social media have become the rule rather than the exception as the way to interact, socialize, and exchange information. AI-powered systems have become a reality and started to affect our lives in important ways. These systems and services collect huge amounts of data about us and exploit it for various purposes that could affect our lives positively or negatively. Even though most of these systems claim to abide by data protection regulations and ethics, data misuse incidents keep making the headlines. In this new digital world, data transparency for end users is becoming a fundamental aspect to consider when designing, implementing, and deploying a system, service, or software [1, 3, 4]. Transparency allows users to track down and follow how their data are collected, transmitted, stored, processed, exploited, and serviced. It also allows them to verify how fairly they are treated by algorithms, software, and systems that affect their lives. Data transparency is a complex concept that is interpreted and approached in different ways by different research communities and bodies. A comprehensive definition of data transparency is proposed by Bertino et al. as “the ability of subjects to effectively gain access to all information related to data used in processes and decisions that affect the subjects” [2].","PeriodicalId":299504,"journal":{"name":"ACM Journal of Data and Information Quality (JDIQ)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127726514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信