ACM Journal of Data and Information Quality最新文献

Text2EL+: Expert Guided Event Log Enrichment using Unstructured Text Text2EL+：专家指导下使用非结构化文本丰富事件日志

IF 2.1

ACM Journal of Data and Information Quality Pub Date : 2024-01-10 DOI: 10.1145/3640018

D. T. K. Geeganage, M. Wynn, A. Hofstede

{"title":"Text2EL+: Expert Guided Event Log Enrichment using Unstructured Text","authors":"D. T. K. Geeganage, M. Wynn, A. Hofstede","doi":"10.1145/3640018","DOIUrl":"https://doi.org/10.1145/3640018","url":null,"abstract":"Through the application of process mining, business processes can be improved on the basis of process execution data captured in event logs. Naturally, the quality of this data determines the quality of the improvement recommendations. Improving data quality is non-trivial and there is great potential to exploit unstructured text, e.g. from notes, reviews, and comments, for this purpose and to enrich event logs. To this end, this paper introduces Text2EL+ a three-phase approach to enrich event logs using unstructured text. In its first phase, events and (case and event) attributes are derived from unstructured text linked to organisational processes. In its second phase, these events and attributes undergo a semantic and contextual validation before their incorporation in the event log. In its third and final phase, recognising the importance of human domain expertise, expert guidance is used to further improve data quality by removing redundant and irrelevant events. Expert input is used to train a Named Entity Recognition (NER) model with customised tags to detect event log elements. The approach applies natural language processing techniques, sentence embeddings, training pipelines and models, as well as contextual and expression validation. Various unstructured clinical notes associated with a healthcare case study were analysed and completeness, concordance, and correctness of the derived event log elements were evaluated through experiments. The results show that the proposed method is feasible and applicable.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"5 8","pages":""},"PeriodicalIF":2.1,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139440108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Catalog of Consumer IoT Device Characteristics for Data Quality Estimation 用于数据质量评估的消费类物联网设备特征目录

IF 2.1

ACM Journal of Data and Information Quality Pub Date : 2024-01-09 DOI: 10.1145/3639708

Valentina Golendukhina, Harald Foidl, Daniel Hörl, Michael Felderer

{"title":"A Catalog of Consumer IoT Device Characteristics for Data Quality Estimation","authors":"Valentina Golendukhina, Harald Foidl, Daniel Hörl, Michael Felderer","doi":"10.1145/3639708","DOIUrl":"https://doi.org/10.1145/3639708","url":null,"abstract":"The Internet of Things (IoT) is rapidly growing and spreading across different markets, including the customer market and consumer IoT (CIoT). The large variety of gadgets and their availability makes CIoT more and more influential, especially in the wearable and smart home domains. However, the large variety of devices and their inconsistent quality due to varying hardware costs have an influence on the data produced by such devices. In this article, a catalog of CIoT properties is introduced, which enables the prediction of data quality. The data quality catalog contains six categories and 21 properties with descriptions and trust score calculation methods. A diagramming tool is implemented to support and facilitate the process of evaluation. The tool was assessed in an experimental setting with 14 users and received positive feedback. Additionally, we provide an exemplary application for smartwatch devices and compare the results obtained with the approach with the users’ evaluation based on the feedback from 158 smartwatch owners. As a result, the method-based ranking does not provide similar results to the regular users. However, it yields comparable outcomes to the assessment conducted by experienced users.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"28 12","pages":""},"PeriodicalIF":2.1,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139444060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AI explainibility and acceptance; a case study for underwater mine hunting 人工智能的可解释性和可接受性；水下猎雷案例研究

IF 2.1

ACM Journal of Data and Information Quality Pub Date : 2023-12-21 DOI: 10.1145/3635113

Gj. Richard, Thales Dms, Imt Atlantique, France J. Habonneau, France D. Gueriot, France

{"title":"AI explainibility and acceptance; a case study for underwater mine hunting","authors":"Gj. Richard, Thales Dms, Imt Atlantique, France J. Habonneau, France D. Gueriot, France","doi":"10.1145/3635113","DOIUrl":"https://doi.org/10.1145/3635113","url":null,"abstract":"In critical operational context such as Mine Warfare, Automatic Target Recognition (ATR) algorithms are still hardly accepted. The complexity of their decision-making hampers understanding of predictions despite performances approaching human expert ones. Much research has been done in Explainability Artificial Intelligence (XAI) field to avoid this ”black box” effect. This field of research attempts to provide explanations for the decision-making of complex networks to promote their acceptability. Most of the explanation methods applied on image classifier networks provide heat maps. These maps highlight pixels according to their importance in decision-making. In this work, we first implement different XAI methods for the automatic classification of Synthetic Aperture Sonar (SAS) images by convolutional neural networks (CNN). These different methods are based on a Post-Hoc approach. We study and compare the different heat maps obtained. Secondly, we evaluate the benefits and the usefulness of explainability in an operational framework for collaboration. To do this, different user tests are carried out with different levels of assistance ranging from classification for an unaided operator, to classification with explained ATR. These tests allow us to study whether heat maps are useful in this context. The results obtained show that the heat maps explanation have a disputed utility according to the operators. Heat map presence does not increase the quality of the classifications. On the contrary, it even increases the response time. Nevertheless, half of operators see a certain usefulness in heat maps explanation.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"8 3","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138952091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data quality assessment through a preference model 通过偏好模型评估数据质量

IF 2.1

ACM Journal of Data and Information Quality Pub Date : 2023-11-29 DOI: 10.1145/3632407

Julian Le Deunf, Arwa Khannoussi, Laurent Lecornu, Patrick Meyer, John Puentes

引用次数: 0

Editorial: Special Issue on Quality Aspects of Data Preparation 编辑数据准备的质量问题特刊

IF 2.1

ACM Journal of Data and Information Quality Pub Date : 2023-11-01 DOI: 10.1145/3626461

Marco Console, Maurizio Lenzerini

引用次数: 0

Enhancing Human-in-the-Loop Ontology Curation Results through Task Design 通过任务设计增强人在循环本体管理结果

ACM Journal of Data and Information Quality Pub Date : 2023-10-06 DOI: 10.1145/3626960

Stefani Tsaneva, Marta Sabou

{"title":"Enhancing Human-in-the-Loop Ontology Curation Results through Task Design","authors":"Stefani Tsaneva, Marta Sabou","doi":"10.1145/3626960","DOIUrl":"https://doi.org/10.1145/3626960","url":null,"abstract":"The success of artificial intelligence (AI) applications is heavily dependant on the quality of data they rely on. Thus, data curation, dealing with cleaning, organising and managing data, has become a significant research area to be addressed. Increasingly, semantic data structures such as ontologies and knowledge graphs empower the new generation of AI systems. In this paper, we focus on ontologies, as a special type of data. Ontologies are conceptual data structures representing a domain of interest and are often used as a backbone to knowledge-based intelligent systems or as an additional input for machine learning algorithms. Low-quality ontologies, containing incorrectly represented information or controversial concepts modelled from a single viewpoint can lead to invalid application outputs and biased systems. Thus, we focus on the curation of ontologies as a crucial factor for ensuring trust in the enabled AI systems. While some ontology quality aspects can be automatically evaluated, others require a human-in-the-loop evaluation. Yet, despite the importance of the field several ontology quality aspects have not yet been addressed and there is a lack of guidelines for optimal design of human computation tasks to perform such evaluations. In this paper, we advance the state-of-the-art by making two novel contributions: First, we propose a human-computation (HC)-based approach for the verification of ontology restrictions - an ontology evaluation aspect that has not yet been addressed with HC techniques. Second, by performing two controlled experiments with a junior expert crowd, we empirically derive task design guidelines for achieving high-quality evaluation results related to i) the formalism for representing ontology axioms and ii) crowd qualification testing . We find that the representation format of the ontology does not significantly influence the campaign results, nevertheless, contributors expressed a preference in working with a graphical ontology representation. Additionally we show that an objective qualification test is better fitted at assessing contributors’ prior knowledge rather than a subjective self-assessment and that prior modelling knowledge of the contributors had a positive effect on their judgements. We make all artefacts designed and used in the experimental campaign publicly available.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135347474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Editorial: Multimodality, Multidimensional Representation, and Multimedia Quality Assessment Toward Information Quality in Social Web of Things 社论：多模态、多维表征和多媒体质量评估，实现社交物联网的信息质量

IF 2.1

ACM Journal of Data and Information Quality Pub Date : 2023-09-30 DOI: 10.1145/3625102

Chinmay Chakraborty, Mohammad Hossein Khosravi, Muhammad Khurram Khan, Houbing Song

引用次数: 0

Validating Synthetic Usage Data in Living Lab Environments 在生活实验室环境中验证合成使用数据

ACM Journal of Data and Information Quality Pub Date : 2023-09-24 DOI: 10.1145/3623640

Timo Breuer, Norbert Fuhr, Philipp Schaer

{"title":"Validating Synthetic Usage Data in Living Lab Environments","authors":"Timo Breuer, Norbert Fuhr, Philipp Schaer","doi":"10.1145/3623640","DOIUrl":"https://doi.org/10.1145/3623640","url":null,"abstract":"Evaluating retrieval performance without editorial relevance judgments is challenging, but instead, user interactions can be used as relevance signals. Living labs offer a way for small-scale platforms to validate information retrieval systems with real users. If enough user interaction data are available, click models can be parameterized from historical sessions to evaluate systems before exposing users to experimental rankings. However, interaction data are sparse in living labs, and little is studied about how click models can be validated for reliable user simulations when click data are available in moderate amounts. This work introduces an evaluation approach for validating synthetic usage data generated by click models in data-sparse human-in-the-loop environments like living labs. We ground our methodology on the click model's estimates about a system ranking compared to a reference ranking for which the relative performance is known. Our experiments compare different click models and their reliability and robustness as more session log data becomes available. In our setup, simple click models can reliably determine the relative system performance with already 20 logged sessions for 50 queries. In contrast, more complex click models require more session data for reliable estimates, but they are a better choice in simulated interleaving experiments when enough session data are available. While it is easier for click models to distinguish between more diverse systems, it is harder to reproduce the system ranking based on the same retrieval algorithm with different interpolation weights. Our setup is entirely open, and we share the code to reproduce the experiments.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135925519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Experience: Data Management for delivering COVID-19 relief in Panama 经验:在巴拿马提供COVID-19救援的数据管理

ACM Journal of Data and Information Quality Pub Date : 2023-09-22 DOI: 10.1145/3623511

Luis Del Vasto-Terrientes

{"title":"Experience: Data Management for delivering COVID-19 relief in Panama","authors":"Luis Del Vasto-Terrientes","doi":"10.1145/3623511","DOIUrl":"https://doi.org/10.1145/3623511","url":null,"abstract":"A data-driven public sector recognizes data as a key element for implementing policies based on evidence. The open data movement has been a major catalyst for elevating data to a privileged position in many governments around the globe. In Panama, open data has enabled the improvement of data management in each institution. However, it is required to go further to create an integrated data-driven government with a common objective. Public institutions collect a huge amount of data that may never be used, and some others do not contain enough quality to provide trustworthy results. The state of emergency caused by the COVID-19 showed the necessity of establishing a common digital government vision for planning, delivering, and monitoring public services, as well as strengthening the technical foundation in the public sector to improve data value cycle: acquisition, storage, and exploitation. This paper reports from a data custodian perspective how the state of emergency worked as a catalyst to boost government data management, specifically for the Vale Digital program, a social relief linked to the identity card implemented by the Panamanian government during the COVID-19 pandemic, which may possibly be the greatest government data integration to date in terms of impact, data volume, rapid implementation, and institutions involved.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136060963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Process-Data Quality: The True Frontier of Process Mining 过程数据质量:过程挖掘的真正前沿

IF 2.1

ACM Journal of Data and Information Quality Pub Date : 2023-08-23 DOI: 10.1145/3613247

A. T. Ter Hofstede, A. Koschmider, Andrea Marrella, R. Andrews, D. Fischer, Sareh Sadeghianasl, M. Wynn, M. Comuzzi, Jochen De Weerdt, Kanika Goel, Niels Martin, P. Soffer

{"title":"Process-Data Quality: The True Frontier of Process Mining","authors":"A. T. Ter Hofstede, A. Koschmider, Andrea Marrella, R. Andrews, D. Fischer, Sareh Sadeghianasl, M. Wynn, M. Comuzzi, Jochen De Weerdt, Kanika Goel, Niels Martin, P. Soffer","doi":"10.1145/3613247","DOIUrl":"https://doi.org/10.1145/3613247","url":null,"abstract":"Since its emergence over two decades ago, process mining has flourished as a discipline, with numerous contributions to its theory, widespread practical applications, and mature support by commercial tooling environments. However, its potential for significant organisational impact is hampered by poor quality event data. Process mining starts with the acquisition and preparation of event data coming from different data sources. These are then transformed into event logs, consisting of process execution traces including multiple events. In real-life scenarios, event logs suffer from significant data quality problems, which must be recognised and effectively resolved for obtaining meaningful insights from process mining analysis. Despite its importance, the topic of data quality in process mining has received limited attention. In this paper, we discuss the emerging challenges related to process-data quality from both a research and practical point of view. Additionally, we present a corresponding research agenda with key research directions.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"19 1","pages":"1 - 21"},"PeriodicalIF":2.1,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89016744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0