Feature reduction for hepatocellular carcinoma prediction using machine learning algorithms

IF 6.4 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data Pub Date : 2024-06-18 DOI:10.1186/s40537-024-00944-3

Ghada Mostafa, Hamdi Mahmoud, Tarek Abd El-Hafeez, Mohamed E. ElAraby

{"title":"Feature reduction for hepatocellular carcinoma prediction using machine learning algorithms","authors":"Ghada Mostafa, Hamdi Mahmoud, Tarek Abd El-Hafeez, Mohamed E. ElAraby","doi":"10.1186/s40537-024-00944-3","DOIUrl":null,"url":null,"abstract":"<p>Hepatocellular carcinoma (HCC) is a highly prevalent form of liver cancer that necessitates accurate prediction models for early diagnosis and effective treatment. Machine learning algorithms have demonstrated promising results in various medical domains, including cancer prediction. In this study, we propose a comprehensive approach for HCC prediction by comparing the performance of different machine learning algorithms before and after applying feature reduction methods. We employ popular feature reduction techniques, such as weighting features, hidden features correlation, feature selection, and optimized selection, to extract a reduced feature subset that captures the most relevant information related to HCC. Subsequently, we apply multiple algorithms, including Naive Bayes, support vector machines (SVM), Neural Networks, Decision Tree, and K nearest neighbors (KNN), to both the original high-dimensional dataset and the reduced feature set. By comparing the predictive accuracy, precision, F Score, recall, and execution time of each algorithm, we assess the effectiveness of feature reduction in enhancing the performance of HCC prediction models. Our experimental results, obtained using a comprehensive dataset comprising clinical features of HCC patients, demonstrate that feature reduction significantly improves the performance of all examined algorithms. Notably, the reduced feature set consistently outperforms the original high-dimensional dataset in terms of prediction accuracy and execution time. After applying feature reduction techniques, the employed algorithms, namely decision trees, Naive Bayes, KNN, neural networks, and SVM achieved accuracies of 96%, 97.33%, 94.67%, 96%, and 96.00%, respectively.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"22 1","pages":""},"PeriodicalIF":6.4000,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Big Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s40537-024-00944-3","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Hepatocellular carcinoma (HCC) is a highly prevalent form of liver cancer that necessitates accurate prediction models for early diagnosis and effective treatment. Machine learning algorithms have demonstrated promising results in various medical domains, including cancer prediction. In this study, we propose a comprehensive approach for HCC prediction by comparing the performance of different machine learning algorithms before and after applying feature reduction methods. We employ popular feature reduction techniques, such as weighting features, hidden features correlation, feature selection, and optimized selection, to extract a reduced feature subset that captures the most relevant information related to HCC. Subsequently, we apply multiple algorithms, including Naive Bayes, support vector machines (SVM), Neural Networks, Decision Tree, and K nearest neighbors (KNN), to both the original high-dimensional dataset and the reduced feature set. By comparing the predictive accuracy, precision, F Score, recall, and execution time of each algorithm, we assess the effectiveness of feature reduction in enhancing the performance of HCC prediction models. Our experimental results, obtained using a comprehensive dataset comprising clinical features of HCC patients, demonstrate that feature reduction significantly improves the performance of all examined algorithms. Notably, the reduced feature set consistently outperforms the original high-dimensional dataset in terms of prediction accuracy and execution time. After applying feature reduction techniques, the employed algorithms, namely decision trees, Naive Bayes, KNN, neural networks, and SVM achieved accuracies of 96%, 97.33%, 94.67%, 96%, and 96.00%, respectively.

Abstract Image

查看原文本刊更多论文

利用机器学习算法减少预测肝细胞癌的特征

肝细胞癌（HCC）是一种高发的肝癌，需要准确的预测模型来进行早期诊断和有效治疗。机器学习算法在包括癌症预测在内的各种医疗领域都取得了可喜的成果。在本研究中，我们通过比较不同机器学习算法在应用特征缩减方法前后的性能，提出了一种用于 HCC 预测的综合方法。我们采用了流行的特征缩减技术，如加权特征、隐藏特征相关性、特征选择和优化选择，以提取能捕捉与 HCC 最相关信息的缩减特征子集。随后，我们对原始高维数据集和缩减后的特征集应用了多种算法，包括奈维贝叶、支持向量机（SVM）、神经网络、决策树和 K 近邻（KNN）。通过比较每种算法的预测准确率、精确度、F Score、召回率和执行时间，我们评估了特征缩减在提高 HCC 预测模型性能方面的有效性。我们使用包含 HCC 患者临床特征的综合数据集获得的实验结果表明，特征缩减显著提高了所有研究算法的性能。值得注意的是，缩减后的特征集在预测准确性和执行时间方面始终优于原始高维数据集。在应用特征缩减技术后，所采用的算法，即决策树、Naive Bayes、KNN、神经网络和 SVM 的准确率分别达到了 96%、97.33%、94.67%、96% 和 96.00%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Big Data Computer Science-Information Systems

CiteScore

17.80

自引率

3.70%

发文量

105

审稿时长

13 weeks

期刊介绍： The Journal of Big Data publishes high-quality, scholarly research papers, methodologies, and case studies covering a broad spectrum of topics, from big data analytics to data-intensive computing and all applications of big data research. It addresses challenges facing big data today and in the future, including data capture and storage, search, sharing, analytics, technologies, visualization, architectures, data mining, machine learning, cloud computing, distributed systems, and scalable storage. The journal serves as a seminal source of innovative material for academic researchers and practitioners alike.