使用集成学习方法预测肺癌存活率

2017 Intelligent Systems Conference (IntelliSys) Pub Date : 2017-09-01 DOI:10.1109/INTELLISYS.2017.8324368

Ali Safiyari, R. Javidan

{"title":"使用集成学习方法预测肺癌存活率","authors":"Ali Safiyari, R. Javidan","doi":"10.1109/INTELLISYS.2017.8324368","DOIUrl":null,"url":null,"abstract":"Ensemble methods are powerful techniques used in machine learning to improve the prediction accuracy of classifier learning systems. In this study, different ensemble learning methods for lung cancer survival prediction were evaluated on the Surveillance, Epidemiology, and End Results (SEER) dataset. Data were preprocessed in several steps before applying classification models. Five popular ensemble methods, Bagging, Dagging, AdaBoost, MultiBoosting and Random SubSpace, and eight classification algorithms, RIPPER, Decision Stump, Simple Cart, C4.5, SMO, Logistic Regression, Bayes Net and Random Forest, as base classifiers were evaluated for lung cancer survival prediction. Then, risk of mortality after 5 years of diagnosis has been estimated. The prediction performance is measured in terms of accuracy and area under ROC curve (AUC). AdaBoost Algorithm had the best efficiency in increasing base classifiers performance in comparison to other four ensemble methods. It increased the accuracy of RIPPER from 88.88% to 88.98%, the accuracy of decision stump algorithm from 81.21% to 87.67% and the accuracy of SMO algorithm from 83.41% to 87.16%. AdaBoost algorithm also increased the AUC of RIPPER from 91.5% to 94.9%, the AUC of decision stump algorithm from 81.2% to 93.9%, the AUC of J48 algorithm from 94.1% to 94.9% and the AUC of SMO algorithm from 50.0% to 92.1%. Random subspace algorithm was the worst method in comparison to other ensemble techniques used in this study. The results empirically showed that ensemble methods are able to evaluate the performance of their base classifiers and they are appropriate methods for analysis of cancer survival.","PeriodicalId":131825,"journal":{"name":"2017 Intelligent Systems Conference (IntelliSys)","volume":"186 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Predicting lung cancer survivability using ensemble learning methods\",\"authors\":\"Ali Safiyari, R. Javidan\",\"doi\":\"10.1109/INTELLISYS.2017.8324368\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Ensemble methods are powerful techniques used in machine learning to improve the prediction accuracy of classifier learning systems. In this study, different ensemble learning methods for lung cancer survival prediction were evaluated on the Surveillance, Epidemiology, and End Results (SEER) dataset. Data were preprocessed in several steps before applying classification models. Five popular ensemble methods, Bagging, Dagging, AdaBoost, MultiBoosting and Random SubSpace, and eight classification algorithms, RIPPER, Decision Stump, Simple Cart, C4.5, SMO, Logistic Regression, Bayes Net and Random Forest, as base classifiers were evaluated for lung cancer survival prediction. Then, risk of mortality after 5 years of diagnosis has been estimated. The prediction performance is measured in terms of accuracy and area under ROC curve (AUC). AdaBoost Algorithm had the best efficiency in increasing base classifiers performance in comparison to other four ensemble methods. It increased the accuracy of RIPPER from 88.88% to 88.98%, the accuracy of decision stump algorithm from 81.21% to 87.67% and the accuracy of SMO algorithm from 83.41% to 87.16%. AdaBoost algorithm also increased the AUC of RIPPER from 91.5% to 94.9%, the AUC of decision stump algorithm from 81.2% to 93.9%, the AUC of J48 algorithm from 94.1% to 94.9% and the AUC of SMO algorithm from 50.0% to 92.1%. Random subspace algorithm was the worst method in comparison to other ensemble techniques used in this study. The results empirically showed that ensemble methods are able to evaluate the performance of their base classifiers and they are appropriate methods for analysis of cancer survival.\",\"PeriodicalId\":131825,\"journal\":{\"name\":\"2017 Intelligent Systems Conference (IntelliSys)\",\"volume\":\"186 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 Intelligent Systems Conference (IntelliSys)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INTELLISYS.2017.8324368\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Intelligent Systems Conference (IntelliSys)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INTELLISYS.2017.8324368","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

摘要

集成方法是机器学习中用于提高分类器学习系统预测精度的强大技术。在这项研究中，在监测、流行病学和最终结果(SEER)数据集上评估了肺癌生存预测的不同集成学习方法。在应用分类模型之前，对数据进行了几个步骤的预处理。评价了Bagging、Dagging、AdaBoost、MultiBoosting和Random SubSpace 5种常用的集成方法以及RIPPER、Decision Stump、Simple Cart、C4.5、SMO、Logistic回归、Bayes Net和Random Forest 8种分类算法作为肺癌生存预测的基本分类器。然后，估计5年后的死亡风险。预测性能以准确度和ROC曲线下面积(AUC)来衡量。与其他四种集成方法相比，AdaBoost算法在提高基分类器性能方面效率最高。将RIPPER算法的准确率从88.88%提高到88.98%，将decision stump算法的准确率从81.21%提高到87.67%，将SMO算法的准确率从83.41%提高到87.16%。AdaBoost算法也将RIPPER算法的AUC从91.5%提高到94.9%，决策残桩算法的AUC从81.2%提高到93.9%，J48算法的AUC从94.1%提高到94.9%，SMO算法的AUC从50.0%提高到92.1%。与本研究中使用的其他集成技术相比，随机子空间算法是最差的方法。实验结果表明，集成方法能够评估其基分类器的性能，是分析癌症生存的合适方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Predicting lung cancer survivability using ensemble learning methods

Ensemble methods are powerful techniques used in machine learning to improve the prediction accuracy of classifier learning systems. In this study, different ensemble learning methods for lung cancer survival prediction were evaluated on the Surveillance, Epidemiology, and End Results (SEER) dataset. Data were preprocessed in several steps before applying classification models. Five popular ensemble methods, Bagging, Dagging, AdaBoost, MultiBoosting and Random SubSpace, and eight classification algorithms, RIPPER, Decision Stump, Simple Cart, C4.5, SMO, Logistic Regression, Bayes Net and Random Forest, as base classifiers were evaluated for lung cancer survival prediction. Then, risk of mortality after 5 years of diagnosis has been estimated. The prediction performance is measured in terms of accuracy and area under ROC curve (AUC). AdaBoost Algorithm had the best efficiency in increasing base classifiers performance in comparison to other four ensemble methods. It increased the accuracy of RIPPER from 88.88% to 88.98%, the accuracy of decision stump algorithm from 81.21% to 87.67% and the accuracy of SMO algorithm from 83.41% to 87.16%. AdaBoost algorithm also increased the AUC of RIPPER from 91.5% to 94.9%, the AUC of decision stump algorithm from 81.2% to 93.9%, the AUC of J48 algorithm from 94.1% to 94.9% and the AUC of SMO algorithm from 50.0% to 92.1%. Random subspace algorithm was the worst method in comparison to other ensemble techniques used in this study. The results empirically showed that ensemble methods are able to evaluate the performance of their base classifiers and they are appropriate methods for analysis of cancer survival.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 Intelligent Systems Conference (IntelliSys)

自引率

0.00%

发文量