Ovarian Cancer Prediction Using PCA, K-PCA, ICA and Random Forest

Journal of Intelligent Systems with Applications Pub Date : 2021-12-27 DOI:10.54856/jiswa.202112168

Asiye Şahin, Nermin Ozcan, G. Nur

{"title":"Ovarian Cancer Prediction Using PCA, K-PCA, ICA and Random Forest","authors":"Asiye Şahin, Nermin Ozcan, G. Nur","doi":"10.54856/jiswa.202112168","DOIUrl":null,"url":null,"abstract":"Ovarian cancer, which is the most common in women and occurs mostly in the post-menopausal period, develops with the uncontrolled proliferation of the cells in the ovaries and the formation of tumors. Early diagnosis is very difficult and in most cases, it is a type of cancer that is in advanced stages when first diagnosed. While it tends to be treated successfully in the early stages where it is confined to the ovary, it is more difficult to treat in the advanced stages and is often fatal. For this reason, it has been focused on studies that predict whether people have ovarian cancer. In our study, we designed a RF-based ovarian cancer prediction model using a data set consisting of 49 features including blood routine tests, general chemistry tests and tumor marker data of 349 real patients. Since the data set containing too many dimensions will increase the time and resources that need to be spent, we reduced the dimension of the data with PCA, K-PCA and ICA methods and examined its effect on the result and time saving. The best result was obtained with a score of 0.895 F1 by using the new smaller-sized data obtained by the PCA method, in which the dimension was reduced from 49 to 6, in the RF method, and the training of the model took 18.191 seconds. This result was both better as a success and more economical in terms of time spent during model training compared to the prediction made over larger data with 49 features, where no dimension reduction method was used. The study has shown that in predictions made with machine learning models over large-scale medical data, dimension reduction methods will provide advantages in terms of time and resources by improving the prediction results.","PeriodicalId":112412,"journal":{"name":"Journal of Intelligent Systems with Applications","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Intelligent Systems with Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54856/jiswa.202112168","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Ovarian cancer, which is the most common in women and occurs mostly in the post-menopausal period, develops with the uncontrolled proliferation of the cells in the ovaries and the formation of tumors. Early diagnosis is very difficult and in most cases, it is a type of cancer that is in advanced stages when first diagnosed. While it tends to be treated successfully in the early stages where it is confined to the ovary, it is more difficult to treat in the advanced stages and is often fatal. For this reason, it has been focused on studies that predict whether people have ovarian cancer. In our study, we designed a RF-based ovarian cancer prediction model using a data set consisting of 49 features including blood routine tests, general chemistry tests and tumor marker data of 349 real patients. Since the data set containing too many dimensions will increase the time and resources that need to be spent, we reduced the dimension of the data with PCA, K-PCA and ICA methods and examined its effect on the result and time saving. The best result was obtained with a score of 0.895 F1 by using the new smaller-sized data obtained by the PCA method, in which the dimension was reduced from 49 to 6, in the RF method, and the training of the model took 18.191 seconds. This result was both better as a success and more economical in terms of time spent during model training compared to the prediction made over larger data with 49 features, where no dimension reduction method was used. The study has shown that in predictions made with machine learning models over large-scale medical data, dimension reduction methods will provide advantages in terms of time and resources by improving the prediction results.

查看原文本刊更多论文

基于PCA、K-PCA、ICA和随机森林的卵巢癌预测

卵巢癌是女性中最常见的疾病，主要发生在绝经后，随着卵巢细胞不受控制的增殖和肿瘤的形成而发展。早期诊断是非常困难的，在大多数情况下，这是一种癌症，在第一次诊断时已处于晚期。虽然它往往在早期阶段治疗成功，因为它局限于卵巢，但在晚期更难治疗，而且往往是致命的。出于这个原因，它一直专注于预测人们是否患有卵巢癌的研究。在我们的研究中，我们设计了一个基于射频的卵巢癌预测模型，该模型使用了349名真实患者的血液常规检查、一般化学检查和肿瘤标志物数据等49个特征组成的数据集。由于包含太多维度的数据集会增加需要花费的时间和资源，我们使用PCA, K-PCA和ICA方法对数据进行降维，并检查其对结果和节省时间的影响。使用由PCA方法获得的新的更小尺寸的数据，在RF方法中，将维数从49降为6，得到了最好的结果，得分为0.895 F1，模型的训练时间为18.191秒。与没有使用降维方法的具有49个特征的更大数据的预测相比，这个结果不仅是成功的，而且在模型训练期间花费的时间上也更经济。研究表明，在使用机器学习模型对大规模医疗数据进行预测时，降维方法将通过改善预测结果在时间和资源方面提供优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Intelligent Systems with Applications

自引率

0.00%

发文量