{"title":"Performance Analysis of Self-Supervised Strategies for Standard Genetic Programming","authors":"Nuno M. Rodrigues, J. Almeida, Sara Silva","doi":"10.1145/3583133.3590748","DOIUrl":null,"url":null,"abstract":"Self-supervised learning (SSL) methods have been widely used to train deep learning models for computer vision and natural language processing domains. They leverage large amounts of unlabeled data to help pretrain models by learning patterns implicit in the data. Recently, new SSL techniques for tabular data have been developed, using new pretext tasks that typically aim to reconstruct a corrupted input sample and yielding models which are, ideally, robust feature transforms. In this paper, we pose the research question of whether genetic programming is capable of leveraging data processed using SSL methods to improve its performance. We test this hypothesis by assuming different amounts of labeled data on seven different datasets (five OpenML benchmarking datasets and two real-world datasets). The obtained results show that in almost all problems, standard genetic programming is not able to capitalize on the learned representations, producing results equal to or worse than using the labeled partitions.","PeriodicalId":422029,"journal":{"name":"Proceedings of the Companion Conference on Genetic and Evolutionary Computation","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Companion Conference on Genetic and Evolutionary Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3583133.3590748","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Self-supervised learning (SSL) methods have been widely used to train deep learning models for computer vision and natural language processing domains. They leverage large amounts of unlabeled data to help pretrain models by learning patterns implicit in the data. Recently, new SSL techniques for tabular data have been developed, using new pretext tasks that typically aim to reconstruct a corrupted input sample and yielding models which are, ideally, robust feature transforms. In this paper, we pose the research question of whether genetic programming is capable of leveraging data processed using SSL methods to improve its performance. We test this hypothesis by assuming different amounts of labeled data on seven different datasets (five OpenML benchmarking datasets and two real-world datasets). The obtained results show that in almost all problems, standard genetic programming is not able to capitalize on the learned representations, producing results equal to or worse than using the labeled partitions.