TRIO: Task-agnostic dataset representation optimized for automatic algorithm selection

Noy Cohen-Shapira, L. Rokach
{"title":"TRIO: Task-agnostic dataset representation optimized for automatic algorithm selection","authors":"Noy Cohen-Shapira, L. Rokach","doi":"10.1109/ICDM51629.2021.00018","DOIUrl":null,"url":null,"abstract":"With the growing number of machine learning (ML) algorithms, the selection of the top-performing algorithms for a given dataset, task, and evaluation measure is known to be a challenging task. The human expertise required for this task has fueled the demand for automatic solutions. Meta-learning is a popular approach for automatic algorithm selection based on dataset characterization. Existing meta-learning methods often represent the datasets using predefined features and thus cannot be generalized for various ML tasks, or alternatively, learn their representations in a supervised fashion, and thus cannot address unsupervised tasks. In this study, we first propose a novel learning-based task-agnostic method for dataset representation. Second, we present TRIO, a meta-learning approach based on the proposed dataset representation, which is capable of accurately recommending top-performing algorithms for unseen datasets. TRIO first learns graphical representations from the datasets and then utilizes a graph convolutional neural network technique to extract their latent representations. An extensive evaluation on 337 datasets and 195 ML algorithms demonstrates the effectiveness of our approach over state-of-the-art methods for algorithm selection for both supervised (classification and regression) and unsupervised (clustering) tasks.","PeriodicalId":320970,"journal":{"name":"2021 IEEE International Conference on Data Mining (ICDM)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM51629.2021.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

With the growing number of machine learning (ML) algorithms, the selection of the top-performing algorithms for a given dataset, task, and evaluation measure is known to be a challenging task. The human expertise required for this task has fueled the demand for automatic solutions. Meta-learning is a popular approach for automatic algorithm selection based on dataset characterization. Existing meta-learning methods often represent the datasets using predefined features and thus cannot be generalized for various ML tasks, or alternatively, learn their representations in a supervised fashion, and thus cannot address unsupervised tasks. In this study, we first propose a novel learning-based task-agnostic method for dataset representation. Second, we present TRIO, a meta-learning approach based on the proposed dataset representation, which is capable of accurately recommending top-performing algorithms for unseen datasets. TRIO first learns graphical representations from the datasets and then utilizes a graph convolutional neural network technique to extract their latent representations. An extensive evaluation on 337 datasets and 195 ML algorithms demonstrates the effectiveness of our approach over state-of-the-art methods for algorithm selection for both supervised (classification and regression) and unsupervised (clustering) tasks.
为自动算法选择优化的任务不可知数据集表示
随着机器学习(ML)算法数量的不断增加,为给定的数据集、任务和评估措施选择性能最佳的算法是一项具有挑战性的任务。这项任务所需的人类专业知识推动了对自动化解决方案的需求。元学习是一种基于数据集特征的自动算法选择的流行方法。现有的元学习方法通常使用预定义的特征来表示数据集,因此不能推广到各种ML任务,或者以监督的方式学习它们的表示,因此不能解决无监督的任务。在这项研究中,我们首先提出了一种新的基于学习的数据集表示任务不可知方法。其次,我们提出了TRIO,这是一种基于所提出的数据集表示的元学习方法,它能够准确地为未见过的数据集推荐性能最好的算法。TRIO首先从数据集中学习图形表示,然后利用图形卷积神经网络技术提取其潜在表示。对337个数据集和195个ML算法的广泛评估表明,我们的方法在有监督(分类和回归)和无监督(聚类)任务的算法选择方面优于最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信