Revolutionizing gastric cancer diagnosis through advanced machine learning approaches

Journal of Autonomous Intelligence Pub Date : 2024-03-04 DOI:10.32629/jai.v7i4.1021

Danish Jamil, S. Palaniappan, Muhammad Numan Ali Khan, Syed Mehr Ali Shah

{"title":"Revolutionizing gastric cancer diagnosis through advanced machine learning approaches","authors":"Danish Jamil, S. Palaniappan, Muhammad Numan Ali Khan, Syed Mehr Ali Shah","doi":"10.32629/jai.v7i4.1021","DOIUrl":null,"url":null,"abstract":"Early detection of gastric cancer through a Computer-Aided Detection (CAD) system has the potential to significantly reduce the mortality rate associated with this disease. This study aims to investigate the effects of class imbalance on the performance of machine learning classifiers in this context. Using a dataset of 145,787 screening records from NHS Liverpool Hospital, we employed stratified sampling to create balanced and unbalanced datasets and evaluated the performance of four machine learning algorithms—Logistic Regression, Support Vector Machine, Naive Bayes, and Multilayer Perceptron—under five different test conditions. The study’s novelty lies in its detailed examination of class imbalance in gastric cancer diagnosis, emphasizing the crucial role of balanced datasets in machine learning-based early detection systems. For the MLP model under 10-fold cross-validation, the Class 0 sensitivity (non-cancer cases) of the unbalanced dataset was 0.968, higher than the balanced dataset’s 0.902. However, the Class 1 sensitivity (cancer cases) and Positive Predictive Value (PPV) of the unbalanced dataset were much lower (0.383 and 0.527) than those of the balanced dataset (0.959 and 0.907), indicating a significant improvement in identifying true positive cases when using a balanced dataset. These findings highlight the negative effect of class imbalance on prediction accuracy for positive cancer cases and underscore the importance of addressing this imbalance for more reliable and accurate predictions in medical diagnosis and screening. This approach has the potential to improve patient outcomes and may contribute to strategies aimed at reducing the mortality rate associated with gastric cancer.","PeriodicalId":508223,"journal":{"name":"Journal of Autonomous Intelligence","volume":"76 11","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Autonomous Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32629/jai.v7i4.1021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Early detection of gastric cancer through a Computer-Aided Detection (CAD) system has the potential to significantly reduce the mortality rate associated with this disease. This study aims to investigate the effects of class imbalance on the performance of machine learning classifiers in this context. Using a dataset of 145,787 screening records from NHS Liverpool Hospital, we employed stratified sampling to create balanced and unbalanced datasets and evaluated the performance of four machine learning algorithms—Logistic Regression, Support Vector Machine, Naive Bayes, and Multilayer Perceptron—under five different test conditions. The study’s novelty lies in its detailed examination of class imbalance in gastric cancer diagnosis, emphasizing the crucial role of balanced datasets in machine learning-based early detection systems. For the MLP model under 10-fold cross-validation, the Class 0 sensitivity (non-cancer cases) of the unbalanced dataset was 0.968, higher than the balanced dataset’s 0.902. However, the Class 1 sensitivity (cancer cases) and Positive Predictive Value (PPV) of the unbalanced dataset were much lower (0.383 and 0.527) than those of the balanced dataset (0.959 and 0.907), indicating a significant improvement in identifying true positive cases when using a balanced dataset. These findings highlight the negative effect of class imbalance on prediction accuracy for positive cancer cases and underscore the importance of addressing this imbalance for more reliable and accurate predictions in medical diagnosis and screening. This approach has the potential to improve patient outcomes and may contribute to strategies aimed at reducing the mortality rate associated with gastric cancer.

查看原文本刊更多论文

通过先进的机器学习方法革新胃癌诊断

通过计算机辅助检测（CAD）系统对胃癌进行早期检测，有可能大大降低与这种疾病相关的死亡率。本研究旨在探讨在这种情况下类别不平衡对机器学习分类器性能的影响。我们利用英国国家医疗服务系统利物浦医院的 145,787 份筛查记录数据集，采用分层抽样的方法创建了平衡和不平衡数据集，并评估了四种机器学习算法--逻辑回归、支持向量机、奈夫贝叶斯和多层感知器--在五种不同测试条件下的性能。该研究的新颖之处在于详细考察了胃癌诊断中的类不平衡问题，强调了平衡数据集在基于机器学习的早期检测系统中的关键作用。对于 10 倍交叉验证下的 MLP 模型，不平衡数据集的 0 类灵敏度（非癌症病例）为 0.968，高于平衡数据集的 0.902。然而，非平衡数据集的第 1 类灵敏度（癌症病例）和阳性预测值（PPV）（0.383 和 0.527）远低于平衡数据集（0.959 和 0.907），这表明使用平衡数据集时，在识别真正的阳性病例方面有显著改善。这些发现凸显了类不平衡对癌症阳性病例预测准确性的负面影响，并强调了解决这种不平衡问题对于在医疗诊断和筛查中进行更可靠、更准确预测的重要性。这种方法有可能改善患者的预后，并有助于制定旨在降低胃癌相关死亡率的策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Autonomous Intelligence

自引率

0.00%

发文量