{"title":"Analysis of Different Pre-Processing Techniques to the Development of Machine Learning Predictors with Gene Expression Profiles","authors":"Ian Durán, Roberto Leandro, J. Guevara-Coto","doi":"10.1109/JoCICI48395.2019.9105145","DOIUrl":null,"url":null,"abstract":"In this ongoing research work we analyzed the impact of several different data pre-processing and normalization methods on the performance of Support Vector Machines (SVM). The pre-processing methods used are: deleting missing values, imputing zeroes, means, medians, modes, K-Nearest Neighbors (KNN), and Predictive Mean Matching (PMM). Each one of these pre-processing methods will be paired with two normalization methods, log2 and Z-Score. After training, the models will be tested using a validation set, derived from the training set, representing an unseen partition dataset. Subsequently performance metrics will be obtained and compared across each of the models for the training performance and the test performance. These comparisons will then be analyzed and interpreted. The aim of our work was to potentially identify the impact of pre-processing approaches on predictor construction to potentially identify a standard method for expression profile analysis using machine learning methods.","PeriodicalId":154731,"journal":{"name":"2019 IV Jornadas Costarricenses de Investigación en Computación e Informática (JoCICI)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IV Jornadas Costarricenses de Investigación en Computación e Informática (JoCICI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JoCICI48395.2019.9105145","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this ongoing research work we analyzed the impact of several different data pre-processing and normalization methods on the performance of Support Vector Machines (SVM). The pre-processing methods used are: deleting missing values, imputing zeroes, means, medians, modes, K-Nearest Neighbors (KNN), and Predictive Mean Matching (PMM). Each one of these pre-processing methods will be paired with two normalization methods, log2 and Z-Score. After training, the models will be tested using a validation set, derived from the training set, representing an unseen partition dataset. Subsequently performance metrics will be obtained and compared across each of the models for the training performance and the test performance. These comparisons will then be analyzed and interpreted. The aim of our work was to potentially identify the impact of pre-processing approaches on predictor construction to potentially identify a standard method for expression profile analysis using machine learning methods.