Intouch Kunakorntum, Woranich Hinthong, Sumet Amonyingchareon, P. Phunchongharn
{"title":"基于概率分布(SyMProD)过采样技术的合成少数派肝癌预测","authors":"Intouch Kunakorntum, Woranich Hinthong, Sumet Amonyingchareon, P. Phunchongharn","doi":"10.1109/ICAwST.2019.8923122","DOIUrl":null,"url":null,"abstract":"Liver cancer is challenging to diagnose in general. Moreover, liver cancer prediction can be hindered by skewed data between majority and minority classes, and missing values. Many existing prediction models do not address these two limitations that can make classification results ignore minority instances (i.e., patients with liver cancer are not detected). In this paper, we present a liver cancer prediction model with a new oversampling technique called Synthetic Minority based on Probabilistic Distribution (SyMProD) to handle skewed patients’ data from Chulabhorn hospital. SyMProD removes noisy data based on z-score normalization value and adaptively selects referenced data using probability distribution from the ratio of minority and majority closeness factor. The proposed method oversamples minority instances from several minority nearest neighbors to cover the distribution. We employ Random Forest (RF) and Gradient Boosted Tree (GBT) to generate prediction models with stratified five-fold cross-validation. Results demonstrate that GBT with our proposed oversampling technique achieves a better result than other techniques. These results from our technique generate new instances in the minority distribution, avoid the majority region, remove the overgeneralization problem, and reduce possibilities of creating noise and overlapping classes. Our prediction model may help prompt high-risk patients to get a proper diagnosis and treatments in time.","PeriodicalId":156538,"journal":{"name":"2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Liver Cancer Prediction Using Synthetic Minority based on Probabilistic Distribution (SyMProD) Oversampling Technique\",\"authors\":\"Intouch Kunakorntum, Woranich Hinthong, Sumet Amonyingchareon, P. Phunchongharn\",\"doi\":\"10.1109/ICAwST.2019.8923122\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Liver cancer is challenging to diagnose in general. Moreover, liver cancer prediction can be hindered by skewed data between majority and minority classes, and missing values. Many existing prediction models do not address these two limitations that can make classification results ignore minority instances (i.e., patients with liver cancer are not detected). In this paper, we present a liver cancer prediction model with a new oversampling technique called Synthetic Minority based on Probabilistic Distribution (SyMProD) to handle skewed patients’ data from Chulabhorn hospital. SyMProD removes noisy data based on z-score normalization value and adaptively selects referenced data using probability distribution from the ratio of minority and majority closeness factor. The proposed method oversamples minority instances from several minority nearest neighbors to cover the distribution. We employ Random Forest (RF) and Gradient Boosted Tree (GBT) to generate prediction models with stratified five-fold cross-validation. Results demonstrate that GBT with our proposed oversampling technique achieves a better result than other techniques. These results from our technique generate new instances in the minority distribution, avoid the majority region, remove the overgeneralization problem, and reduce possibilities of creating noise and overlapping classes. Our prediction model may help prompt high-risk patients to get a proper diagnosis and treatments in time.\",\"PeriodicalId\":156538,\"journal\":{\"name\":\"2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAwST.2019.8923122\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAwST.2019.8923122","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Liver Cancer Prediction Using Synthetic Minority based on Probabilistic Distribution (SyMProD) Oversampling Technique
Liver cancer is challenging to diagnose in general. Moreover, liver cancer prediction can be hindered by skewed data between majority and minority classes, and missing values. Many existing prediction models do not address these two limitations that can make classification results ignore minority instances (i.e., patients with liver cancer are not detected). In this paper, we present a liver cancer prediction model with a new oversampling technique called Synthetic Minority based on Probabilistic Distribution (SyMProD) to handle skewed patients’ data from Chulabhorn hospital. SyMProD removes noisy data based on z-score normalization value and adaptively selects referenced data using probability distribution from the ratio of minority and majority closeness factor. The proposed method oversamples minority instances from several minority nearest neighbors to cover the distribution. We employ Random Forest (RF) and Gradient Boosted Tree (GBT) to generate prediction models with stratified five-fold cross-validation. Results demonstrate that GBT with our proposed oversampling technique achieves a better result than other techniques. These results from our technique generate new instances in the minority distribution, avoid the majority region, remove the overgeneralization problem, and reduce possibilities of creating noise and overlapping classes. Our prediction model may help prompt high-risk patients to get a proper diagnosis and treatments in time.