Ankur Jariwala, Aayushi Chaudhari, C. Bhatt, Dac-Nhuong Le
{"title":"Data Quality for AI Tool: Exploratory Data Analysis on IBM API","authors":"Ankur Jariwala, Aayushi Chaudhari, C. Bhatt, Dac-Nhuong Le","doi":"10.5815/ijisa.2022.01.04","DOIUrl":null,"url":null,"abstract":"A huge amount of data is produced in every domain these days. Thus for applying automation on any dataset, the appropriately trained data plays an important role in achieving efficient and accurate results. According to data researchers, data scientists spare 80% of their time in preparing and organizing the data. To overcome this tedious task, IBM Research has developed a Data Quality for AI tool, which has varieties of metrics that can be applied to different datasets (in .csv format) to identify the quality of data. In this paper, we will be representing how the IBM API toolkit will be useful for different variants of datasets and showcase the results for each metrics in graphical form. This paper might be found useful for the readers to understand the working flow of the IBM data purifier tool, thus we have represented the entire flow of how to use IBM data quality for the AI toolkit in the form of architecture.","PeriodicalId":14067,"journal":{"name":"International Journal of Intelligent Systems and Applications in Engineering","volume":"65 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Intelligent Systems and Applications in Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijisa.2022.01.04","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 2
Abstract
A huge amount of data is produced in every domain these days. Thus for applying automation on any dataset, the appropriately trained data plays an important role in achieving efficient and accurate results. According to data researchers, data scientists spare 80% of their time in preparing and organizing the data. To overcome this tedious task, IBM Research has developed a Data Quality for AI tool, which has varieties of metrics that can be applied to different datasets (in .csv format) to identify the quality of data. In this paper, we will be representing how the IBM API toolkit will be useful for different variants of datasets and showcase the results for each metrics in graphical form. This paper might be found useful for the readers to understand the working flow of the IBM data purifier tool, thus we have represented the entire flow of how to use IBM data quality for the AI toolkit in the form of architecture.