Selecting Machine Learning Algorithms Using Regression Models

2015 IEEE International Conference on Data Mining Workshop (ICDMW) Pub Date : 2015-11-14 DOI:10.1109/ICDMW.2015.43

Tri Doan, J. Kalita

引用次数: 32

Abstract

In performing data mining, a common task is to search for the most appropriate algorithm(s) to retrieve important information from data. With an increasing number of available data mining techniques, it may be impractical to experiment with many techniques on a specific dataset of interest to find the best algorithm(s). In this paper, we demonstrate the suitability of tree-based multi-variable linear regression in predicting algorithm performance. We take into account prior machine learning experience to construct meta-knowledge for supervised learning. The idea is to use summary knowledge about datasets along with past performance of algorithms on these datasets to build this meta-knowledge. We augment pure statistical summaries with descriptive features and a misclassification cost, and discover that transformed datasets obtained by reducing a high dimensional feature space to a smaller dimension still retain significant characteristic knowledge necessary to predict algorithm performance. Our approach works well for both numerical and nominal data obtained from real world environments.

查看原文本刊更多论文

使用回归模型选择机器学习算法

在执行数据挖掘时，一个常见的任务是搜索最合适的算法来从数据中检索重要信息。随着可用的数据挖掘技术越来越多，在感兴趣的特定数据集上试验许多技术以找到最佳算法可能是不切实际的。在本文中，我们证明了基于树的多变量线性回归在预测算法性能方面的适用性。我们考虑之前的机器学习经验来构建元知识进行监督学习。这个想法是使用关于数据集的总结知识以及这些数据集上算法的过去性能来构建这个元知识。我们用描述性特征和错误分类代价增强了纯统计摘要，并发现通过将高维特征空间降至较小维度获得的转换数据集仍然保留了预测算法性能所需的重要特征知识。我们的方法适用于从真实世界环境中获得的数值和标称数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE International Conference on Data Mining Workshop (ICDMW)

自引率

0.00%

发文量