Anisah Andini, B. E. Manurung, Marvel Sugi, Septasia Dwi Angfika, S. Harimurti, W. Adiprawita, Isa Anshori
{"title":"Pattern Recognition using Machine Learning for Cancer Classification","authors":"Anisah Andini, B. E. Manurung, Marvel Sugi, Septasia Dwi Angfika, S. Harimurti, W. Adiprawita, Isa Anshori","doi":"10.1109/APCoRISE46197.2019.9318819","DOIUrl":null,"url":null,"abstract":"This paper presents the application of machine learning on gene expression datasets in order to classify cancer cells. Several analytical methods, including Principal Component Analysis (PCA), Support Vector Machine (SVM), Gradient Boosting, and XGBoost are performed to find the best model for processing the datasets. Additionally, classification with hyperparameter tuning using GridSearch and RandomSearch are also performed. The dataset is obtained from the study published by Golub et al [1]. They reported how new cases of cancer could be classified by gene expression monitoring via DNA microarray and thereby provided a general approach in identifying new classes of cancer and assigning tumors to the existing and known classes. The datasets were used to classify patients diagnosed with acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL). These datasets contain measurements in correspond to ALL and AML data samples from Bone Marrow and Peripheral Blood. Based on the simulation results, PCA with K-Nearest Neighbor shows the best result by providing 82% of classification accuracy.","PeriodicalId":250648,"journal":{"name":"2019 Asia Pacific Conference on Research in Industrial and Systems Engineering (APCoRISE)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Asia Pacific Conference on Research in Industrial and Systems Engineering (APCoRISE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APCoRISE46197.2019.9318819","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper presents the application of machine learning on gene expression datasets in order to classify cancer cells. Several analytical methods, including Principal Component Analysis (PCA), Support Vector Machine (SVM), Gradient Boosting, and XGBoost are performed to find the best model for processing the datasets. Additionally, classification with hyperparameter tuning using GridSearch and RandomSearch are also performed. The dataset is obtained from the study published by Golub et al [1]. They reported how new cases of cancer could be classified by gene expression monitoring via DNA microarray and thereby provided a general approach in identifying new classes of cancer and assigning tumors to the existing and known classes. The datasets were used to classify patients diagnosed with acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL). These datasets contain measurements in correspond to ALL and AML data samples from Bone Marrow and Peripheral Blood. Based on the simulation results, PCA with K-Nearest Neighbor shows the best result by providing 82% of classification accuracy.