{"title":"不同预处理技术对开发具有基因表达谱的机器学习预测因子的分析","authors":"Ian Durán, Roberto Leandro, J. Guevara-Coto","doi":"10.1109/JoCICI48395.2019.9105145","DOIUrl":null,"url":null,"abstract":"In this ongoing research work we analyzed the impact of several different data pre-processing and normalization methods on the performance of Support Vector Machines (SVM). The pre-processing methods used are: deleting missing values, imputing zeroes, means, medians, modes, K-Nearest Neighbors (KNN), and Predictive Mean Matching (PMM). Each one of these pre-processing methods will be paired with two normalization methods, log2 and Z-Score. After training, the models will be tested using a validation set, derived from the training set, representing an unseen partition dataset. Subsequently performance metrics will be obtained and compared across each of the models for the training performance and the test performance. These comparisons will then be analyzed and interpreted. The aim of our work was to potentially identify the impact of pre-processing approaches on predictor construction to potentially identify a standard method for expression profile analysis using machine learning methods.","PeriodicalId":154731,"journal":{"name":"2019 IV Jornadas Costarricenses de Investigación en Computación e Informática (JoCICI)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Analysis of Different Pre-Processing Techniques to the Development of Machine Learning Predictors with Gene Expression Profiles\",\"authors\":\"Ian Durán, Roberto Leandro, J. Guevara-Coto\",\"doi\":\"10.1109/JoCICI48395.2019.9105145\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this ongoing research work we analyzed the impact of several different data pre-processing and normalization methods on the performance of Support Vector Machines (SVM). The pre-processing methods used are: deleting missing values, imputing zeroes, means, medians, modes, K-Nearest Neighbors (KNN), and Predictive Mean Matching (PMM). Each one of these pre-processing methods will be paired with two normalization methods, log2 and Z-Score. After training, the models will be tested using a validation set, derived from the training set, representing an unseen partition dataset. Subsequently performance metrics will be obtained and compared across each of the models for the training performance and the test performance. These comparisons will then be analyzed and interpreted. The aim of our work was to potentially identify the impact of pre-processing approaches on predictor construction to potentially identify a standard method for expression profile analysis using machine learning methods.\",\"PeriodicalId\":154731,\"journal\":{\"name\":\"2019 IV Jornadas Costarricenses de Investigación en Computación e Informática (JoCICI)\",\"volume\":\"108 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IV Jornadas Costarricenses de Investigación en Computación e Informática (JoCICI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/JoCICI48395.2019.9105145\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IV Jornadas Costarricenses de Investigación en Computación e Informática (JoCICI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JoCICI48395.2019.9105145","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analysis of Different Pre-Processing Techniques to the Development of Machine Learning Predictors with Gene Expression Profiles
In this ongoing research work we analyzed the impact of several different data pre-processing and normalization methods on the performance of Support Vector Machines (SVM). The pre-processing methods used are: deleting missing values, imputing zeroes, means, medians, modes, K-Nearest Neighbors (KNN), and Predictive Mean Matching (PMM). Each one of these pre-processing methods will be paired with two normalization methods, log2 and Z-Score. After training, the models will be tested using a validation set, derived from the training set, representing an unseen partition dataset. Subsequently performance metrics will be obtained and compared across each of the models for the training performance and the test performance. These comparisons will then be analyzed and interpreted. The aim of our work was to potentially identify the impact of pre-processing approaches on predictor construction to potentially identify a standard method for expression profile analysis using machine learning methods.