Optimised feature selection and cervical cancer prediction using Machine learning classification

Q4 Medicine
A. Tak, P. Parihar, Fatehpuriya Singh, Yogesh Singh
{"title":"Optimised feature selection and cervical cancer prediction using Machine learning classification","authors":"A. Tak, P. Parihar, Fatehpuriya Singh, Yogesh Singh","doi":"10.5937/scriptamed53-38848","DOIUrl":null,"url":null,"abstract":"Background: Screening and early detection play a key role in cervical cancer prevention. The present study predicts the outcome of various diagnostic tests used to diagnose cervical cancer using machine learning algorithms. Methods: The present study ran various cervical cancer risk factors on a machine learning (ML) classifier to predict outcomes of Hinselmann, Schiller, cytology and biopsy. The dataset is publicly available on the Machine Learning Repository website of the University of California Irvine. The imbalanced dataset was pre-processed using oversampling methods. The significantly varied features between the two levels of a response variable were used to train the machine learning classifiers on MATLAB. The classifiers used were Decision Trees, Support Vector Machine, K-Nearest Neighbours and Ensemble learning classifiers. The performance metrics of the classifiers were expressed as accuracy, the area under the receiver operator characteristic (AU-ROC) curve, sensitivity and specificity. Results: The Fine Gaussian SVM classifier was the best to classify Hinselmann, cytology and biopsy with the accuracy of 97.5 %, 62.5 % and 98 %, respectively. However, Boosted trees performed best in the classification of Schiller with 81.3 % accuracy. Conclusion: The present study selected optimised features among multiple risk factors to train various ML classifiers to predict cervical cancer.","PeriodicalId":33497,"journal":{"name":"Scripta Medica","volume":"30 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scripta Medica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5937/scriptamed53-38848","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Screening and early detection play a key role in cervical cancer prevention. The present study predicts the outcome of various diagnostic tests used to diagnose cervical cancer using machine learning algorithms. Methods: The present study ran various cervical cancer risk factors on a machine learning (ML) classifier to predict outcomes of Hinselmann, Schiller, cytology and biopsy. The dataset is publicly available on the Machine Learning Repository website of the University of California Irvine. The imbalanced dataset was pre-processed using oversampling methods. The significantly varied features between the two levels of a response variable were used to train the machine learning classifiers on MATLAB. The classifiers used were Decision Trees, Support Vector Machine, K-Nearest Neighbours and Ensemble learning classifiers. The performance metrics of the classifiers were expressed as accuracy, the area under the receiver operator characteristic (AU-ROC) curve, sensitivity and specificity. Results: The Fine Gaussian SVM classifier was the best to classify Hinselmann, cytology and biopsy with the accuracy of 97.5 %, 62.5 % and 98 %, respectively. However, Boosted trees performed best in the classification of Schiller with 81.3 % accuracy. Conclusion: The present study selected optimised features among multiple risk factors to train various ML classifiers to predict cervical cancer.
利用机器学习分类优化特征选择和宫颈癌预测
背景:筛查和早期发现在宫颈癌预防中起着关键作用。本研究使用机器学习算法预测了用于诊断宫颈癌的各种诊断测试的结果。方法:本研究在机器学习(ML)分类器上运行各种宫颈癌危险因素,以预测Hinselmann, Schiller,细胞学和活检的结果。该数据集可在加州大学欧文分校的机器学习存储库网站上公开获取。采用过采样方法对不平衡数据集进行预处理。利用响应变量的两个层次之间显著变化的特征在MATLAB上训练机器学习分类器。使用的分类器有决策树、支持向量机、k近邻和集成学习分类器。分类器的性能指标表示为准确率、受试者操作特征(AU-ROC)曲线下面积、灵敏度和特异性。结果:细高斯支持向量机分类器对Hinselmann、细胞学和活检的分类准确率分别为97.5%、62.5%和98%,准确率最高。然而,增强树在席勒分类中表现最好,准确率为81.3%。结论:本研究在多个危险因素中选择最优特征,训练各种ML分类器预测宫颈癌。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
0.60
自引率
0.00%
发文量
13
审稿时长
4 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信