{"title":"Rotation Forest and Random Oracles: Two Classifier Ensemble Methods","authors":"Juan José Rodríguez Diez","doi":"10.1109/CBMS.2007.94","DOIUrl":null,"url":null,"abstract":"Classification methods are widely used in computer-based medical systems. Often, the accuracy of a classifier can be improved using a classifier ensemble, the combination of several classifiers. Two classifiers ensembles and their results on several medical data sets will be presented: Rotation Forest (Rodriguez, Kuncheva and Alonso) and Random Oracles (Kuncheva and Rodriguez). Rotation Forest is a method for generating classifier ensembles based on feature extraction. To create the training data for a base classifier, the feature set is randomly split into K subsets (K is a parameter of the algorithm) and Principal Component Analysis (PCA) is applied to each subset. All principal components are retained in order to preserve the variability information in the data. Thus, K axis rotations take place to form the new features for a base classifier. The idea of the rotation approach is to encourage simultaneously individual accuracy and diversity within the ensemble. Diversity is promoted through the feature extraction for each base classifier. Decision trees were chosen here because they are sensitive to rotation of the feature axes, hence the name \"forest.\" Accuracy is sought by keeping all principal components and also using the whole data set to train each base classifier. Comparisons with various standard ensemble methods (Bagging, AdaBoost, and Random Forest) will be reported. Diversity-error diagrams reveal that Rotation Forest ensembles construct individual classifiers which are more accurate than these in AdaBoost and Random Forest and more diverse than these in Bagging, sometimes more accurate as well. A random oracle classifier is a mini-ensemble formed by a pair of classifiers and a fixed, randomly created oracle that selects between them. The random oracle can be thought of as a random discriminant function which splits the data into two subsets with no regard of any class labels or cluster structure. Two random oracles has been considered: linear and spherical. A random oracle classifier can be used as the base classifier of any ensemble method. It is argued that this approach encourages extra diversity in the ensemble while allowing for high accuracy of the individual ensemble members. Experiments with several data sets from UCI and 11 ensemble models will be reported. Each ensemble model will be examined with and without the oracle. The results will show that all ensemble methods benefited from the new approach, most markedly so random subspace and bagging. A further experiment with seven real medical data sets will demonstrate the validity of these findings outside the UCI data collection. When using Naive Bayes Classifiers as base classifiers, the experiments show that ensembles based solely upon the spherical oracle (and no other ensemble heuristic) outrank Bagging, Wagging, Random Subspaces, AdaBoost.Ml, MultiBoost and Decorate. Moreover, all these ensemble methods are better with any of the two random oracles than their standard versions without the oracles.","PeriodicalId":74567,"journal":{"name":"Proceedings. IEEE International Symposium on Computer-Based Medical Systems","volume":"178 1","pages":"3"},"PeriodicalIF":0.0000,"publicationDate":"2007-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Symposium on Computer-Based Medical Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBMS.2007.94","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
Classification methods are widely used in computer-based medical systems. Often, the accuracy of a classifier can be improved using a classifier ensemble, the combination of several classifiers. Two classifiers ensembles and their results on several medical data sets will be presented: Rotation Forest (Rodriguez, Kuncheva and Alonso) and Random Oracles (Kuncheva and Rodriguez). Rotation Forest is a method for generating classifier ensembles based on feature extraction. To create the training data for a base classifier, the feature set is randomly split into K subsets (K is a parameter of the algorithm) and Principal Component Analysis (PCA) is applied to each subset. All principal components are retained in order to preserve the variability information in the data. Thus, K axis rotations take place to form the new features for a base classifier. The idea of the rotation approach is to encourage simultaneously individual accuracy and diversity within the ensemble. Diversity is promoted through the feature extraction for each base classifier. Decision trees were chosen here because they are sensitive to rotation of the feature axes, hence the name "forest." Accuracy is sought by keeping all principal components and also using the whole data set to train each base classifier. Comparisons with various standard ensemble methods (Bagging, AdaBoost, and Random Forest) will be reported. Diversity-error diagrams reveal that Rotation Forest ensembles construct individual classifiers which are more accurate than these in AdaBoost and Random Forest and more diverse than these in Bagging, sometimes more accurate as well. A random oracle classifier is a mini-ensemble formed by a pair of classifiers and a fixed, randomly created oracle that selects between them. The random oracle can be thought of as a random discriminant function which splits the data into two subsets with no regard of any class labels or cluster structure. Two random oracles has been considered: linear and spherical. A random oracle classifier can be used as the base classifier of any ensemble method. It is argued that this approach encourages extra diversity in the ensemble while allowing for high accuracy of the individual ensemble members. Experiments with several data sets from UCI and 11 ensemble models will be reported. Each ensemble model will be examined with and without the oracle. The results will show that all ensemble methods benefited from the new approach, most markedly so random subspace and bagging. A further experiment with seven real medical data sets will demonstrate the validity of these findings outside the UCI data collection. When using Naive Bayes Classifiers as base classifiers, the experiments show that ensembles based solely upon the spherical oracle (and no other ensemble heuristic) outrank Bagging, Wagging, Random Subspaces, AdaBoost.Ml, MultiBoost and Decorate. Moreover, all these ensemble methods are better with any of the two random oracles than their standard versions without the oracles.