Optimised feature selection and cervical cancer prediction using Machine learning classification

Q4 Medicine

Scripta Medica Pub Date : 2022-01-01 DOI:10.5937/scriptamed53-38848

A. Tak, P. Parihar, Fatehpuriya Singh, Yogesh Singh

{"title":"Optimised feature selection and cervical cancer prediction using Machine learning classification","authors":"A. Tak, P. Parihar, Fatehpuriya Singh, Yogesh Singh","doi":"10.5937/scriptamed53-38848","DOIUrl":null,"url":null,"abstract":"Background: Screening and early detection play a key role in cervical cancer prevention. The present study predicts the outcome of various diagnostic tests used to diagnose cervical cancer using machine learning algorithms. Methods: The present study ran various cervical cancer risk factors on a machine learning (ML) classifier to predict outcomes of Hinselmann, Schiller, cytology and biopsy. The dataset is publicly available on the Machine Learning Repository website of the University of California Irvine. The imbalanced dataset was pre-processed using oversampling methods. The significantly varied features between the two levels of a response variable were used to train the machine learning classifiers on MATLAB. The classifiers used were Decision Trees, Support Vector Machine, K-Nearest Neighbours and Ensemble learning classifiers. The performance metrics of the classifiers were expressed as accuracy, the area under the receiver operator characteristic (AU-ROC) curve, sensitivity and specificity. Results: The Fine Gaussian SVM classifier was the best to classify Hinselmann, cytology and biopsy with the accuracy of 97.5 %, 62.5 % and 98 %, respectively. However, Boosted trees performed best in the classification of Schiller with 81.3 % accuracy. Conclusion: The present study selected optimised features among multiple risk factors to train various ML classifiers to predict cervical cancer.","PeriodicalId":33497,"journal":{"name":"Scripta Medica","volume":"30 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scripta Medica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5937/scriptamed53-38848","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Screening and early detection play a key role in cervical cancer prevention. The present study predicts the outcome of various diagnostic tests used to diagnose cervical cancer using machine learning algorithms. Methods: The present study ran various cervical cancer risk factors on a machine learning (ML) classifier to predict outcomes of Hinselmann, Schiller, cytology and biopsy. The dataset is publicly available on the Machine Learning Repository website of the University of California Irvine. The imbalanced dataset was pre-processed using oversampling methods. The significantly varied features between the two levels of a response variable were used to train the machine learning classifiers on MATLAB. The classifiers used were Decision Trees, Support Vector Machine, K-Nearest Neighbours and Ensemble learning classifiers. The performance metrics of the classifiers were expressed as accuracy, the area under the receiver operator characteristic (AU-ROC) curve, sensitivity and specificity. Results: The Fine Gaussian SVM classifier was the best to classify Hinselmann, cytology and biopsy with the accuracy of 97.5 %, 62.5 % and 98 %, respectively. However, Boosted trees performed best in the classification of Schiller with 81.3 % accuracy. Conclusion: The present study selected optimised features among multiple risk factors to train various ML classifiers to predict cervical cancer.

查看原文本刊更多论文

利用机器学习分类优化特征选择和宫颈癌预测

背景:筛查和早期发现在宫颈癌预防中起着关键作用。本研究使用机器学习算法预测了用于诊断宫颈癌的各种诊断测试的结果。方法:本研究在机器学习(ML)分类器上运行各种宫颈癌危险因素，以预测Hinselmann, Schiller，细胞学和活检的结果。该数据集可在加州大学欧文分校的机器学习存储库网站上公开获取。采用过采样方法对不平衡数据集进行预处理。利用响应变量的两个层次之间显著变化的特征在MATLAB上训练机器学习分类器。使用的分类器有决策树、支持向量机、k近邻和集成学习分类器。分类器的性能指标表示为准确率、受试者操作特征(AU-ROC)曲线下面积、灵敏度和特异性。结果:细高斯支持向量机分类器对Hinselmann、细胞学和活检的分类准确率分别为97.5%、62.5%和98%，准确率最高。然而，增强树在席勒分类中表现最好，准确率为81.3%。结论:本研究在多个危险因素中选择最优特征，训练各种ML分类器预测宫颈癌。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊