The Role of Linear Discriminant Analysis for Accurate Prediction of Breast Cancer

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2021-12-01 DOI:10.1109/MCSoC51149.2021.00057

Egwom Onyinyechi Jessica, Mohamed Hamada, S. Yusuf, Mohammed Hassan

{"title":"The Role of Linear Discriminant Analysis for Accurate Prediction of Breast Cancer","authors":"Egwom Onyinyechi Jessica, Mohamed Hamada, S. Yusuf, Mohammed Hassan","doi":"10.1109/MCSoC51149.2021.00057","DOIUrl":null,"url":null,"abstract":"With the recent advances in clinical technologies, a huge amount of data has been accumulated for breast cancer diagnosis. Extracting information from the data to support the clinical diagnosis of breast cancer is a tedious and time-consuming task. The use of machine learning and data mining techniques has significantly changed the whole process of a breast cancer diagnosis. In this research, a prediction model for breast cancer prediction has been developed using features extracted from individual medical screening and tests. To overcome the problem of overfitting and obtain a good prediction accuracy, a Linear Discriminant Analysis (LDA) is applied for the extraction of useful features. This is done to reduce the number of features in the experimental dataset. The proposed model can create new features from the existing features and then get rid of the original features. The newly created features were able to summarize the necessary information contained initially in the original set of features. LDA was chosen because of its usefulness in detecting whether a set of features is worthwhile in predicting breast cancer. In addition to LDA, the proposed model uses Support Vector Machine (SVM) for accurate prediction, hence the name LDA-SVM prediction model. Based on 5-fold cross-validation, the proposed model yields an accuracy of 99.2%, precision of 98.0%, and Recall of 99.0% when tested on the Wisconsin Diagnostic Breast Cancer (WDBC) dataset from the University of California- Irvine machine learning repository. Therefore, SVM shows high efficiency in handling classification problems when combined with feature extraction techniques.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MCSoC51149.2021.00057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

With the recent advances in clinical technologies, a huge amount of data has been accumulated for breast cancer diagnosis. Extracting information from the data to support the clinical diagnosis of breast cancer is a tedious and time-consuming task. The use of machine learning and data mining techniques has significantly changed the whole process of a breast cancer diagnosis. In this research, a prediction model for breast cancer prediction has been developed using features extracted from individual medical screening and tests. To overcome the problem of overfitting and obtain a good prediction accuracy, a Linear Discriminant Analysis (LDA) is applied for the extraction of useful features. This is done to reduce the number of features in the experimental dataset. The proposed model can create new features from the existing features and then get rid of the original features. The newly created features were able to summarize the necessary information contained initially in the original set of features. LDA was chosen because of its usefulness in detecting whether a set of features is worthwhile in predicting breast cancer. In addition to LDA, the proposed model uses Support Vector Machine (SVM) for accurate prediction, hence the name LDA-SVM prediction model. Based on 5-fold cross-validation, the proposed model yields an accuracy of 99.2%, precision of 98.0%, and Recall of 99.0% when tested on the Wisconsin Diagnostic Breast Cancer (WDBC) dataset from the University of California- Irvine machine learning repository. Therefore, SVM shows high efficiency in handling classification problems when combined with feature extraction techniques.

查看原文本刊更多论文

线性判别分析在乳腺癌准确预测中的作用

随着近年来临床技术的进步，为乳腺癌的诊断积累了大量的数据。从数据中提取信息以支持乳腺癌的临床诊断是一项乏味而耗时的任务。机器学习和数据挖掘技术的使用极大地改变了乳腺癌诊断的整个过程。在这项研究中，利用从个人医疗筛查和测试中提取的特征，开发了一种预测乳腺癌的预测模型。为了克服过拟合问题，获得较好的预测精度，采用线性判别分析(LDA)方法提取有用特征。这样做是为了减少实验数据集中的特征数量。该模型可以从已有的特征中生成新的特征，然后去掉原有的特征。新创建的特性能够总结最初包含在原始特性集中的必要信息。之所以选择LDA，是因为它在检测一组特征是否值得预测乳腺癌方面很有用。在LDA的基础上，采用支持向量机(Support Vector Machine, SVM)进行准确预测，故称LDA-SVM预测模型。基于5倍交叉验证，该模型在来自加州大学欧文分校机器学习存储库的威斯康星诊断乳腺癌(WDBC)数据集上进行测试时，准确率为99.2%，精密度为98.0%，召回率为99.0%。因此，当SVM与特征提取技术相结合时，SVM在处理分类问题时表现出很高的效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

自引率

0.00%

发文量