Caecilia Bintang Girik Allo, L. Putra, N. R. Paranoan, Vincentius Abdi Gunawan
{"title":"Comparing Logistic Regression and Support Vector Machine in Breast Cancer Problem","authors":"Caecilia Bintang Girik Allo, L. Putra, N. R. Paranoan, Vincentius Abdi Gunawan","doi":"10.34312/jjps.v4i1.19246","DOIUrl":null,"url":null,"abstract":"There are several methods used for the classification problems. There are many different kinds of fields that can be used. Nowadays, Support Vector Machine (SVM) is a popular classification method that has been proposed by many researchers. Using the same method but different distribution methods for creating training and testing data in the same dataset can yield varying results in terms of prediction accuracy, which is crucial in classification. In this paper, we compare the prediction accuracy between SVM results and Logistic Regression results to determine the better method to classify the current condition of the patient after undergoing some treatment. Several treatments are used in this paper, including feature selection, feature extraction, separating the train and testing data using Holdout and K-Fold CV. Stepwise selection is done to reduce the features. Training and testing dataset is obtained using the five stratified and non-stratified holdout and five fold stratified and non-stratified cross validation. The result shows that the best method to classify the cancer dataset is five fold stratified cross validation SVM with radial kernel. The obtained accuracy is 81,816% with variance as much as 0,94%.","PeriodicalId":315674,"journal":{"name":"Jambura Journal of Probability and Statistics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jambura Journal of Probability and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34312/jjps.v4i1.19246","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
There are several methods used for the classification problems. There are many different kinds of fields that can be used. Nowadays, Support Vector Machine (SVM) is a popular classification method that has been proposed by many researchers. Using the same method but different distribution methods for creating training and testing data in the same dataset can yield varying results in terms of prediction accuracy, which is crucial in classification. In this paper, we compare the prediction accuracy between SVM results and Logistic Regression results to determine the better method to classify the current condition of the patient after undergoing some treatment. Several treatments are used in this paper, including feature selection, feature extraction, separating the train and testing data using Holdout and K-Fold CV. Stepwise selection is done to reduce the features. Training and testing dataset is obtained using the five stratified and non-stratified holdout and five fold stratified and non-stratified cross validation. The result shows that the best method to classify the cancer dataset is five fold stratified cross validation SVM with radial kernel. The obtained accuracy is 81,816% with variance as much as 0,94%.