Construction of an automated screening system to predict breast cancer diagnosis and prognosis

Basic and applied pathology Pub Date : 2012-03-15 DOI:10.1111/j.1755-9294.2012.01124.x

Sou-Young Jin, Jae-Kyung Won, Hojin Lee, Ho-Jin Choi

{"title":"Construction of an automated screening system to predict breast cancer diagnosis and prognosis","authors":"Sou-Young Jin, Jae-Kyung Won, Hojin Lee, Ho-Jin Choi","doi":"10.1111/j.1755-9294.2012.01124.x","DOIUrl":null,"url":null,"abstract":"<div>\n \n Background and aim: Using machine learning methods can be helpful in the clinical decision processes such as pathological diagnosis with the aid of microscopic feature datasets. In the present study using the Breast Cancer Wisconsin dataset, an optimal algorithm (classifiers) which can predict both diagnosis (benign vs malignant) and prognosis (recur vs non-recur) was devised by comparing several classification algorithms. Methods: The performance of a two-step algorithm, which sequentially decides diagnosis and prognosis, was compared with that of a multi-class classifier, which divides classes simultaneously. Results: In the two-step classifier, it was discovered that the functional trees (FT) algorithm is the best for the first step of classification, and Naïve Bayes is the best for the second step of classification. On the other hand, the one-step classifier shows better accuracy and better prediction on benign and non-recurring cases than the two-step classifier, but it shows lower accuracy on predicting recurring cases, leading to lower sensitivity. Conclusions: We conclude that the two-step classifier with FT and Naïve Bayes is better than the one-step classifier. This work will be helpful in setting the automated screening system in real clinics and highlight clues to improve the accuracy by refining data and algorithm selection in data mining or machine learning processes.\n </div>","PeriodicalId":92990,"journal":{"name":"Basic and applied pathology","volume":"5 1","pages":"15-18"},"PeriodicalIF":0.0000,"publicationDate":"2012-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1111/j.1755-9294.2012.01124.x","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Basic and applied pathology","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/j.1755-9294.2012.01124.x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Background and aim: Using machine learning methods can be helpful in the clinical decision processes such as pathological diagnosis with the aid of microscopic feature datasets. In the present study using the Breast Cancer Wisconsin dataset, an optimal algorithm (classifiers) which can predict both diagnosis (benign vs malignant) and prognosis (recur vs non-recur) was devised by comparing several classification algorithms. Methods: The performance of a two-step algorithm, which sequentially decides diagnosis and prognosis, was compared with that of a multi-class classifier, which divides classes simultaneously. Results: In the two-step classifier, it was discovered that the functional trees (FT) algorithm is the best for the first step of classification, and Naïve Bayes is the best for the second step of classification. On the other hand, the one-step classifier shows better accuracy and better prediction on benign and non-recurring cases than the two-step classifier, but it shows lower accuracy on predicting recurring cases, leading to lower sensitivity. Conclusions: We conclude that the two-step classifier with FT and Naïve Bayes is better than the one-step classifier. This work will be helpful in setting the automated screening system in real clinics and highlight clues to improve the accuracy by refining data and algorithm selection in data mining or machine learning processes.

查看原文本刊更多论文

构建预测乳腺癌诊断和预后的自动筛查系统

背景与目的:利用机器学习方法可以帮助临床决策过程，如借助微观特征数据集进行病理诊断。在本研究中，使用乳腺癌威斯康星数据集，通过比较几种分类算法，设计了一种可以预测诊断(良性与恶性)和预后(复发与非复发)的最佳算法(分类器)。方法:将顺序决定诊断和预后的两步算法与同时划分类别的多类分类器的性能进行比较。结果:在两步分类器中，发现功能树(FT)算法对第一步分类效果最好，Naïve贝叶斯算法对第二步分类效果最好。另一方面，与两步分类器相比，一步分类器对良性和非复发病例的准确率更高，预测效果更好，但对复发病例的预测准确率较低，导致灵敏度较低。结论:结合FT和Naïve贝叶斯的两步分类器优于一步分类器。这项工作将有助于在实际诊所中设置自动筛选系统，并通过数据挖掘或机器学习过程中精炼数据和算法选择来提高准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Basic and applied pathology

自引率

0.00%

发文量