N. Chawla, Thomas E. Moore, K. Bowyer, L. Hall, C. Springer, W. Kegelmeyer
{"title":"Bagging Is a Small-Data-Set Phenomenon","authors":"N. Chawla, Thomas E. Moore, K. Bowyer, L. Hall, C. Springer, W. Kegelmeyer","doi":"10.1109/CVPR.2001.991030","DOIUrl":null,"url":null,"abstract":"Bagging forms a committee of classifiers by bootstrap aggregation of training sets from a pool of training data. A simple alternative to bagging is to partition the data into disjoint subsets. Experiments on various datasets show that, given the same size partitions and bags, disjoint partitions result in better performance than bootstrap aggregates (bags). Many applications (e.g., protein structure prediction) involve the use of datasets that are too large to handle in the memory of a typical computer. Our results indicate that, in such applications, the simple approach of creating a committee of classifiers from disjoint partitions is preferred over the more complex approach of bagging.","PeriodicalId":89346,"journal":{"name":"Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops","volume":"143 1","pages":"684-689"},"PeriodicalIF":0.0000,"publicationDate":"2001-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2001.991030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
Bagging forms a committee of classifiers by bootstrap aggregation of training sets from a pool of training data. A simple alternative to bagging is to partition the data into disjoint subsets. Experiments on various datasets show that, given the same size partitions and bags, disjoint partitions result in better performance than bootstrap aggregates (bags). Many applications (e.g., protein structure prediction) involve the use of datasets that are too large to handle in the memory of a typical computer. Our results indicate that, in such applications, the simple approach of creating a committee of classifiers from disjoint partitions is preferred over the more complex approach of bagging.