Computerized Data-Preprocessing To Improve Data Quality

2022 Second International Conference on Power, Control and Computing Technologies (ICPC2T) Pub Date : 2022-03-01 DOI:10.1109/ICPC2T53885.2022.9776676

Rohan Gawhade, Lokesh Ramdev Bohara, Jesvin Mathew, Poonam Bari

{"title":"Computerized Data-Preprocessing To Improve Data Quality","authors":"Rohan Gawhade, Lokesh Ramdev Bohara, Jesvin Mathew, Poonam Bari","doi":"10.1109/ICPC2T53885.2022.9776676","DOIUrl":null,"url":null,"abstract":"Machine Learning (ML) has seen a sudden exponential rise in past decades. Numerous resources and documentation allow people to become ML practitioners. Companies make huge profits out of the analysis and predictions they make. ML Engineers are highly paid for their knowledge in this domain. It has become prevalent and much more comprehensible. One best out of the important stages in ML is Data preprocessing, and feature extraction. In Data Preprocessing itself, there are various tasks one needs to perform accurately to make the data provided. From handling missing values to encoding and normalization, each step has its importance and hence a professional must be adept with each of these steps. Data Preprocessing steps depend upon the type of data provided i.e. categorical data, continuous data, an array of images' pixels or even images themselves. With the requirement to deal with all the cleaning steps, it becomes quite strenuous to learn and become an expert. Moreover, it is time-consuming and does not guarantee expected results. Hence, there is a need to handle this issue. We aim to automate this complete process to ease the work of Machine Learning Engineers and make it more productive. Any user will only have to provide the dataset and does not have to manually select the processing techniques as provided by the latest Data Mining tools. The application will observe the dataset and apply the suitable techniques on its own. Since all the steps will be automated and the user will only have to provide the dataset, even the people who are not familiar with concepts of Machine Learning can pre-process the dataset. This allows the opening of opportunities for people from various domains who desire to perform Machine Learning operations.","PeriodicalId":283298,"journal":{"name":"2022 Second International Conference on Power, Control and Computing Technologies (ICPC2T)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Second International Conference on Power, Control and Computing Technologies (ICPC2T)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPC2T53885.2022.9776676","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Machine Learning (ML) has seen a sudden exponential rise in past decades. Numerous resources and documentation allow people to become ML practitioners. Companies make huge profits out of the analysis and predictions they make. ML Engineers are highly paid for their knowledge in this domain. It has become prevalent and much more comprehensible. One best out of the important stages in ML is Data preprocessing, and feature extraction. In Data Preprocessing itself, there are various tasks one needs to perform accurately to make the data provided. From handling missing values to encoding and normalization, each step has its importance and hence a professional must be adept with each of these steps. Data Preprocessing steps depend upon the type of data provided i.e. categorical data, continuous data, an array of images' pixels or even images themselves. With the requirement to deal with all the cleaning steps, it becomes quite strenuous to learn and become an expert. Moreover, it is time-consuming and does not guarantee expected results. Hence, there is a need to handle this issue. We aim to automate this complete process to ease the work of Machine Learning Engineers and make it more productive. Any user will only have to provide the dataset and does not have to manually select the processing techniques as provided by the latest Data Mining tools. The application will observe the dataset and apply the suitable techniques on its own. Since all the steps will be automated and the user will only have to provide the dataset, even the people who are not familiar with concepts of Machine Learning can pre-process the dataset. This allows the opening of opportunities for people from various domains who desire to perform Machine Learning operations.

查看原文本刊更多论文

计算机数据预处理提高数据质量

在过去的几十年里，机器学习(ML)突然呈指数级增长。大量的资源和文档使人们能够成为ML实践者。公司从他们所做的分析和预测中获得巨额利润。机器学习工程师因其在该领域的知识而获得高薪。它变得很流行，也更容易理解。机器学习中最重要的一个阶段是数据预处理和特征提取。在数据预处理本身中，需要准确地执行各种任务才能提供数据。从处理缺失值到编码和规范化，每个步骤都有其重要性，因此专业人员必须熟练掌握这些步骤。数据预处理步骤取决于所提供的数据类型，即分类数据、连续数据、图像像素数组甚至图像本身。由于需要处理所有的清洁步骤，学习和成为专家变得相当艰苦。此外，它是耗时的，并不能保证预期的结果。因此，有必要处理这个问题。我们的目标是自动化这个完整的过程，以简化机器学习工程师的工作，使其更有效率。任何用户只需要提供数据集，而不必手动选择最新数据挖掘工具提供的处理技术。应用程序将自己观察数据集并应用合适的技术。因为所有的步骤都是自动化的，用户只需要提供数据集，即使是不熟悉机器学习概念的人也可以预处理数据集。这为希望执行机器学习操作的各个领域的人们提供了机会。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 Second International Conference on Power, Control and Computing Technologies (ICPC2T)

自引率

0.00%

发文量