TRIO: Task-agnostic dataset representation optimized for automatic algorithm selection

2021 IEEE International Conference on Data Mining (ICDM) Pub Date : 2021-12-01 DOI:10.1109/ICDM51629.2021.00018

Noy Cohen-Shapira, L. Rokach

{"title":"TRIO: Task-agnostic dataset representation optimized for automatic algorithm selection","authors":"Noy Cohen-Shapira, L. Rokach","doi":"10.1109/ICDM51629.2021.00018","DOIUrl":null,"url":null,"abstract":"With the growing number of machine learning (ML) algorithms, the selection of the top-performing algorithms for a given dataset, task, and evaluation measure is known to be a challenging task. The human expertise required for this task has fueled the demand for automatic solutions. Meta-learning is a popular approach for automatic algorithm selection based on dataset characterization. Existing meta-learning methods often represent the datasets using predefined features and thus cannot be generalized for various ML tasks, or alternatively, learn their representations in a supervised fashion, and thus cannot address unsupervised tasks. In this study, we first propose a novel learning-based task-agnostic method for dataset representation. Second, we present TRIO, a meta-learning approach based on the proposed dataset representation, which is capable of accurately recommending top-performing algorithms for unseen datasets. TRIO first learns graphical representations from the datasets and then utilizes a graph convolutional neural network technique to extract their latent representations. An extensive evaluation on 337 datasets and 195 ML algorithms demonstrates the effectiveness of our approach over state-of-the-art methods for algorithm selection for both supervised (classification and regression) and unsupervised (clustering) tasks.","PeriodicalId":320970,"journal":{"name":"2021 IEEE International Conference on Data Mining (ICDM)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM51629.2021.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

With the growing number of machine learning (ML) algorithms, the selection of the top-performing algorithms for a given dataset, task, and evaluation measure is known to be a challenging task. The human expertise required for this task has fueled the demand for automatic solutions. Meta-learning is a popular approach for automatic algorithm selection based on dataset characterization. Existing meta-learning methods often represent the datasets using predefined features and thus cannot be generalized for various ML tasks, or alternatively, learn their representations in a supervised fashion, and thus cannot address unsupervised tasks. In this study, we first propose a novel learning-based task-agnostic method for dataset representation. Second, we present TRIO, a meta-learning approach based on the proposed dataset representation, which is capable of accurately recommending top-performing algorithms for unseen datasets. TRIO first learns graphical representations from the datasets and then utilizes a graph convolutional neural network technique to extract their latent representations. An extensive evaluation on 337 datasets and 195 ML algorithms demonstrates the effectiveness of our approach over state-of-the-art methods for algorithm selection for both supervised (classification and regression) and unsupervised (clustering) tasks.

查看原文本刊更多论文

为自动算法选择优化的任务不可知数据集表示

随着机器学习(ML)算法数量的不断增加，为给定的数据集、任务和评估措施选择性能最佳的算法是一项具有挑战性的任务。这项任务所需的人类专业知识推动了对自动化解决方案的需求。元学习是一种基于数据集特征的自动算法选择的流行方法。现有的元学习方法通常使用预定义的特征来表示数据集，因此不能推广到各种ML任务，或者以监督的方式学习它们的表示，因此不能解决无监督的任务。在这项研究中，我们首先提出了一种新的基于学习的数据集表示任务不可知方法。其次，我们提出了TRIO，这是一种基于所提出的数据集表示的元学习方法，它能够准确地为未见过的数据集推荐性能最好的算法。TRIO首先从数据集中学习图形表示，然后利用图形卷积神经网络技术提取其潜在表示。对337个数据集和195个ML算法的广泛评估表明，我们的方法在有监督(分类和回归)和无监督(聚类)任务的算法选择方面优于最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Conference on Data Mining (ICDM)

自引率

0.00%

发文量