Rachit Kumar, Joseph D. Romano, M. Ritchie, Jason Moore
{"title":"Extending Tree-Based Automated Machine Learning to Biomedical Image and Text Data Using Custom Feature Extractors","authors":"Rachit Kumar, Joseph D. Romano, M. Ritchie, Jason Moore","doi":"10.1145/3583133.3590584","DOIUrl":null,"url":null,"abstract":"Automated machine learning (AutoML) has allowed for many innovations in biomedical data science; however, most AutoML approaches do not support image or text data. To rectify this, we implemented four feature extractors in the Tree-based Pipeline Optimization Tool (TPOT) to make TPOT with Feature Extraction (TPOT-FE), an automated machine learning system that uses genetic programming (GP) to create ideal pipelines for a classification or regression task. These feature extractors enable TPOT-FE to build pipelines that can analyze non-tabular data, including text and images, which are increasingly common biomedical big data modalities that can contain rich quantities of information. We evaluate this approach on six image datasets and four text datasets, including three biomedical datasets, and show that TPOT-FE is able to consistently construct and optimize classification pipelines on all of the datasets.","PeriodicalId":422029,"journal":{"name":"Proceedings of the Companion Conference on Genetic and Evolutionary Computation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Companion Conference on Genetic and Evolutionary Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3583133.3590584","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Automated machine learning (AutoML) has allowed for many innovations in biomedical data science; however, most AutoML approaches do not support image or text data. To rectify this, we implemented four feature extractors in the Tree-based Pipeline Optimization Tool (TPOT) to make TPOT with Feature Extraction (TPOT-FE), an automated machine learning system that uses genetic programming (GP) to create ideal pipelines for a classification or regression task. These feature extractors enable TPOT-FE to build pipelines that can analyze non-tabular data, including text and images, which are increasingly common biomedical big data modalities that can contain rich quantities of information. We evaluate this approach on six image datasets and four text datasets, including three biomedical datasets, and show that TPOT-FE is able to consistently construct and optimize classification pipelines on all of the datasets.