Intouch Kunakorntum, Woranich Hinthong, Sumet Amonyingchareon, P. Phunchongharn
{"title":"Liver Cancer Prediction Using Synthetic Minority based on Probabilistic Distribution (SyMProD) Oversampling Technique","authors":"Intouch Kunakorntum, Woranich Hinthong, Sumet Amonyingchareon, P. Phunchongharn","doi":"10.1109/ICAwST.2019.8923122","DOIUrl":null,"url":null,"abstract":"Liver cancer is challenging to diagnose in general. Moreover, liver cancer prediction can be hindered by skewed data between majority and minority classes, and missing values. Many existing prediction models do not address these two limitations that can make classification results ignore minority instances (i.e., patients with liver cancer are not detected). In this paper, we present a liver cancer prediction model with a new oversampling technique called Synthetic Minority based on Probabilistic Distribution (SyMProD) to handle skewed patients’ data from Chulabhorn hospital. SyMProD removes noisy data based on z-score normalization value and adaptively selects referenced data using probability distribution from the ratio of minority and majority closeness factor. The proposed method oversamples minority instances from several minority nearest neighbors to cover the distribution. We employ Random Forest (RF) and Gradient Boosted Tree (GBT) to generate prediction models with stratified five-fold cross-validation. Results demonstrate that GBT with our proposed oversampling technique achieves a better result than other techniques. These results from our technique generate new instances in the minority distribution, avoid the majority region, remove the overgeneralization problem, and reduce possibilities of creating noise and overlapping classes. Our prediction model may help prompt high-risk patients to get a proper diagnosis and treatments in time.","PeriodicalId":156538,"journal":{"name":"2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAwST.2019.8923122","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Liver cancer is challenging to diagnose in general. Moreover, liver cancer prediction can be hindered by skewed data between majority and minority classes, and missing values. Many existing prediction models do not address these two limitations that can make classification results ignore minority instances (i.e., patients with liver cancer are not detected). In this paper, we present a liver cancer prediction model with a new oversampling technique called Synthetic Minority based on Probabilistic Distribution (SyMProD) to handle skewed patients’ data from Chulabhorn hospital. SyMProD removes noisy data based on z-score normalization value and adaptively selects referenced data using probability distribution from the ratio of minority and majority closeness factor. The proposed method oversamples minority instances from several minority nearest neighbors to cover the distribution. We employ Random Forest (RF) and Gradient Boosted Tree (GBT) to generate prediction models with stratified five-fold cross-validation. Results demonstrate that GBT with our proposed oversampling technique achieves a better result than other techniques. These results from our technique generate new instances in the minority distribution, avoid the majority region, remove the overgeneralization problem, and reduce possibilities of creating noise and overlapping classes. Our prediction model may help prompt high-risk patients to get a proper diagnosis and treatments in time.