Liver Cancer Prediction Using Synthetic Minority based on Probabilistic Distribution (SyMProD) Oversampling Technique

Intouch Kunakorntum, Woranich Hinthong, Sumet Amonyingchareon, P. Phunchongharn
{"title":"Liver Cancer Prediction Using Synthetic Minority based on Probabilistic Distribution (SyMProD) Oversampling Technique","authors":"Intouch Kunakorntum, Woranich Hinthong, Sumet Amonyingchareon, P. Phunchongharn","doi":"10.1109/ICAwST.2019.8923122","DOIUrl":null,"url":null,"abstract":"Liver cancer is challenging to diagnose in general. Moreover, liver cancer prediction can be hindered by skewed data between majority and minority classes, and missing values. Many existing prediction models do not address these two limitations that can make classification results ignore minority instances (i.e., patients with liver cancer are not detected). In this paper, we present a liver cancer prediction model with a new oversampling technique called Synthetic Minority based on Probabilistic Distribution (SyMProD) to handle skewed patients’ data from Chulabhorn hospital. SyMProD removes noisy data based on z-score normalization value and adaptively selects referenced data using probability distribution from the ratio of minority and majority closeness factor. The proposed method oversamples minority instances from several minority nearest neighbors to cover the distribution. We employ Random Forest (RF) and Gradient Boosted Tree (GBT) to generate prediction models with stratified five-fold cross-validation. Results demonstrate that GBT with our proposed oversampling technique achieves a better result than other techniques. These results from our technique generate new instances in the minority distribution, avoid the majority region, remove the overgeneralization problem, and reduce possibilities of creating noise and overlapping classes. Our prediction model may help prompt high-risk patients to get a proper diagnosis and treatments in time.","PeriodicalId":156538,"journal":{"name":"2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAwST.2019.8923122","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Liver cancer is challenging to diagnose in general. Moreover, liver cancer prediction can be hindered by skewed data between majority and minority classes, and missing values. Many existing prediction models do not address these two limitations that can make classification results ignore minority instances (i.e., patients with liver cancer are not detected). In this paper, we present a liver cancer prediction model with a new oversampling technique called Synthetic Minority based on Probabilistic Distribution (SyMProD) to handle skewed patients’ data from Chulabhorn hospital. SyMProD removes noisy data based on z-score normalization value and adaptively selects referenced data using probability distribution from the ratio of minority and majority closeness factor. The proposed method oversamples minority instances from several minority nearest neighbors to cover the distribution. We employ Random Forest (RF) and Gradient Boosted Tree (GBT) to generate prediction models with stratified five-fold cross-validation. Results demonstrate that GBT with our proposed oversampling technique achieves a better result than other techniques. These results from our technique generate new instances in the minority distribution, avoid the majority region, remove the overgeneralization problem, and reduce possibilities of creating noise and overlapping classes. Our prediction model may help prompt high-risk patients to get a proper diagnosis and treatments in time.
基于概率分布(SyMProD)过采样技术的合成少数派肝癌预测
一般来说,肝癌的诊断具有挑战性。此外,肝癌的预测可能会受到多数和少数类别之间数据偏差以及缺失值的阻碍。许多现有的预测模型没有解决这两个限制,使得分类结果忽略了少数情况(即未检测到肝癌患者)。本文提出了一种基于概率分布的合成少数派(SyMProD)的新过采样技术的肝癌预测模型,以处理来自朱拉蓬医院的倾斜患者数据。SyMProD基于z-score归一化值去除噪声数据,并利用少数和多数接近因子比值的概率分布自适应选择参考数据。该方法从最近的几个少数群体中对少数群体进行过采样,以覆盖分布。我们使用随机森林(RF)和梯度提升树(GBT)来生成具有分层五重交叉验证的预测模型。结果表明,采用我们提出的过采样技术的GBT比其他技术取得了更好的效果。这些结果在少数分布中产生新的实例,避免了多数区域,消除了过度泛化问题,减少了产生噪声和重叠类的可能性。我们的预测模型有助于提示高危患者及时得到正确的诊断和治疗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信