{"title":"Multi-approaches on scrubbing data for medium-sized enterprises","authors":"Tauqeer Faiz","doi":"10.1109/ICD47981.2019.9105739","DOIUrl":null,"url":null,"abstract":"Tidy and fit for purpose data are the prerequisite for analyzing data and for guaranteeing good business decisions. Data Scrubbing or data cleaning is the process of identifying errors and inconsistencies in the data and fixing these errors before analyzing the data. Organization's decisions rely on Data Quality which makes data scrubbing a very important step towards their productivity. Untidy data includes; importing data from multiple sources, missing values or corrupt records, data types mismatch, special character removal or discarding duplicates. Current research is lacking the latest data scrubbing techniques practiced by the medium sized enterprises. This article highlights possible data errors, literature review, and data science project life cycle. The document explains how to clean data using Python libraries for exploratory data analysis such as Pandas, NumPy, Scikit- Learn and libraries for data visualization for example matplotlib, Seaborn, and Plotly.","PeriodicalId":277894,"journal":{"name":"2019 International Conference on Digitization (ICD)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Digitization (ICD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICD47981.2019.9105739","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Tidy and fit for purpose data are the prerequisite for analyzing data and for guaranteeing good business decisions. Data Scrubbing or data cleaning is the process of identifying errors and inconsistencies in the data and fixing these errors before analyzing the data. Organization's decisions rely on Data Quality which makes data scrubbing a very important step towards their productivity. Untidy data includes; importing data from multiple sources, missing values or corrupt records, data types mismatch, special character removal or discarding duplicates. Current research is lacking the latest data scrubbing techniques practiced by the medium sized enterprises. This article highlights possible data errors, literature review, and data science project life cycle. The document explains how to clean data using Python libraries for exploratory data analysis such as Pandas, NumPy, Scikit- Learn and libraries for data visualization for example matplotlib, Seaborn, and Plotly.