The Impact of Data Completeness and Correctness on Explainable Machine Learning Models

J. Data Intell. Pub Date : 2022-05-01 DOI:10.26421/jdi3.2-2

Shelernaz Azimi, C. Pahl

{"title":"The Impact of Data Completeness and Correctness on Explainable Machine Learning Models","authors":"Shelernaz Azimi, C. Pahl","doi":"10.26421/jdi3.2-2","DOIUrl":null,"url":null,"abstract":"Many systems in the Edge Cloud, the Internet-of-Things or Cyber-Physical Systems are built for processing data, which is delivered from sensors and devices, transported, processed and consumed locally by actuators. This, given the regularly high volume of data, permits Artificial Intelligence (AI) strategies like Machine Learning (ML) to be used to generate the application and management functions needed. The quality of both source data and machine learning model is here unavoidably of high significance, yet has not been explored sufficiently as an explicit connection of the ML model quality that are created through ML procedures to the quality of data that the model functions consume in their construction. Here, we investigated the link between input data quality for ML function construction and the quality of these functions in data-driven software systems towards explainable model construction through an experimental approach with IoT data using decision trees.We have 3 objectives in this research: 1. Search for indicators that influence data quality such as correctness and completeness and model construction factors on accuracy, precision and recall. 2. Estimate the impact of variations in model construction and data quality. 3. Identify change patterns that can be attributed to specific input changes. This ultimately aims to support {\\em explainable AI}, i.e., the better understanding of how ML models work and what impacts on their quality.","PeriodicalId":232625,"journal":{"name":"J. Data Intell.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Data Intell.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26421/jdi3.2-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Many systems in the Edge Cloud, the Internet-of-Things or Cyber-Physical Systems are built for processing data, which is delivered from sensors and devices, transported, processed and consumed locally by actuators. This, given the regularly high volume of data, permits Artificial Intelligence (AI) strategies like Machine Learning (ML) to be used to generate the application and management functions needed. The quality of both source data and machine learning model is here unavoidably of high significance, yet has not been explored sufficiently as an explicit connection of the ML model quality that are created through ML procedures to the quality of data that the model functions consume in their construction. Here, we investigated the link between input data quality for ML function construction and the quality of these functions in data-driven software systems towards explainable model construction through an experimental approach with IoT data using decision trees.We have 3 objectives in this research: 1. Search for indicators that influence data quality such as correctness and completeness and model construction factors on accuracy, precision and recall. 2. Estimate the impact of variations in model construction and data quality. 3. Identify change patterns that can be attributed to specific input changes. This ultimately aims to support {\em explainable AI}, i.e., the better understanding of how ML models work and what impacts on their quality.

查看原文本刊更多论文

数据完整性和正确性对可解释机器学习模型的影响

边缘云、物联网或网络物理系统中的许多系统都是为处理数据而构建的，这些数据来自传感器和设备，由执行器在本地传输、处理和使用。考虑到定期大量的数据，这允许使用机器学习(ML)等人工智能(AI)策略来生成所需的应用程序和管理功能。源数据和机器学习模型的质量在这里不可避免地具有重要意义，但尚未充分探讨通过ML过程创建的ML模型质量与模型函数在其构建过程中消耗的数据质量之间的明确联系。在这里，我们通过使用决策树的物联网数据的实验方法，研究了ML功能构建的输入数据质量与数据驱动软件系统中这些功能的质量之间的联系，以实现可解释的模型构建。我们在这项研究中有三个目标:1。搜索影响数据质量的指标，如正确性和完整性，以及模型构建因素对准确性、精密度和召回率的影响。2. 估计模型构建和数据质量变化的影响。3.识别可归因于特定输入变化的变化模式。这最终旨在支持{em可解释的AI}，即更好地理解ML模型的工作方式以及对其质量的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

J. Data Intell.

自引率

0.00%

发文量