分类算法对宫颈癌数据集的影响

N. Sangavi, V. R. Kiruthika, K. Premalatha
{"title":"分类算法对宫颈癌数据集的影响","authors":"N. Sangavi, V. R. Kiruthika, K. Premalatha","doi":"10.1109/ICSCDS53736.2022.9760715","DOIUrl":null,"url":null,"abstract":"Recently, data mining has been used in wide range of domains to gain the knowledge from the insights present in the datasets. In medical field, Cancer is the most effective disease that has been spread across the world. Particularly, cervical cancer is a cancer that happens mostly in women. In order to analyze the symptoms most effectively and to prevent cancer, the analysis of cervical cancer in women has been done using classification algorithms such as neural network, decision tree, random forest, SVM and linear regression algorithm. Data preprocessing and feature selection has been done with the features present in the dataset. The performance of the classification algorithms has been measured by the performance measures such as accuracy specificity, sensitivity, recall and F-measure. Based on the confusion matrix values such as true positive, true negative, false positive and false negative values, the performance measures such as accuracy specificity, sensitivity, recall and F-measure has been calculated. The target variable of the cervical cancer dataset is whether the person affected by cervical cancer or not. The analysis of the cervical cancer has been done with models and based on the performance measure calculated for each models brings out the Random Forest as the best suited model with 80% accuracy among the other models.","PeriodicalId":433549,"journal":{"name":"2022 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Impact of Classification Algorithms on Cervical Cancer Dataset\",\"authors\":\"N. Sangavi, V. R. Kiruthika, K. Premalatha\",\"doi\":\"10.1109/ICSCDS53736.2022.9760715\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, data mining has been used in wide range of domains to gain the knowledge from the insights present in the datasets. In medical field, Cancer is the most effective disease that has been spread across the world. Particularly, cervical cancer is a cancer that happens mostly in women. In order to analyze the symptoms most effectively and to prevent cancer, the analysis of cervical cancer in women has been done using classification algorithms such as neural network, decision tree, random forest, SVM and linear regression algorithm. Data preprocessing and feature selection has been done with the features present in the dataset. The performance of the classification algorithms has been measured by the performance measures such as accuracy specificity, sensitivity, recall and F-measure. Based on the confusion matrix values such as true positive, true negative, false positive and false negative values, the performance measures such as accuracy specificity, sensitivity, recall and F-measure has been calculated. The target variable of the cervical cancer dataset is whether the person affected by cervical cancer or not. The analysis of the cervical cancer has been done with models and based on the performance measure calculated for each models brings out the Random Forest as the best suited model with 80% accuracy among the other models.\",\"PeriodicalId\":433549,\"journal\":{\"name\":\"2022 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSCDS53736.2022.9760715\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSCDS53736.2022.9760715","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

最近,数据挖掘已经在广泛的领域中使用,以从数据集中呈现的见解中获得知识。在医学领域,癌症是最有效的疾病,已经蔓延到世界各地。特别是,宫颈癌是一种主要发生在女性身上的癌症。为了最有效地分析症状,预防癌症,使用神经网络、决策树、随机森林、支持向量机和线性回归算法等分类算法对女性宫颈癌进行了分析。利用数据集中存在的特征进行数据预处理和特征选择。通过准确性、特异性、灵敏度、召回率和f值等性能指标来衡量分类算法的性能。基于真阳性、真阴性、假阳性和假阴性等混淆矩阵值,计算出准确率、特异性、灵敏度、召回率和f测度等性能指标。子宫颈癌数据集的目标变量是人是否患子宫颈癌。对宫颈癌的分析已经用模型完成了,基于对每个模型计算的性能度量,随机森林是最适合的模型,在其他模型中准确率为80%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Impact of Classification Algorithms on Cervical Cancer Dataset
Recently, data mining has been used in wide range of domains to gain the knowledge from the insights present in the datasets. In medical field, Cancer is the most effective disease that has been spread across the world. Particularly, cervical cancer is a cancer that happens mostly in women. In order to analyze the symptoms most effectively and to prevent cancer, the analysis of cervical cancer in women has been done using classification algorithms such as neural network, decision tree, random forest, SVM and linear regression algorithm. Data preprocessing and feature selection has been done with the features present in the dataset. The performance of the classification algorithms has been measured by the performance measures such as accuracy specificity, sensitivity, recall and F-measure. Based on the confusion matrix values such as true positive, true negative, false positive and false negative values, the performance measures such as accuracy specificity, sensitivity, recall and F-measure has been calculated. The target variable of the cervical cancer dataset is whether the person affected by cervical cancer or not. The analysis of the cervical cancer has been done with models and based on the performance measure calculated for each models brings out the Random Forest as the best suited model with 80% accuracy among the other models.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信