Rachit Kumar, Joseph D. Romano, M. Ritchie, Jason Moore
{"title":"使用自定义特征提取器将基于树的自动机器学习扩展到生物医学图像和文本数据","authors":"Rachit Kumar, Joseph D. Romano, M. Ritchie, Jason Moore","doi":"10.1145/3583133.3590584","DOIUrl":null,"url":null,"abstract":"Automated machine learning (AutoML) has allowed for many innovations in biomedical data science; however, most AutoML approaches do not support image or text data. To rectify this, we implemented four feature extractors in the Tree-based Pipeline Optimization Tool (TPOT) to make TPOT with Feature Extraction (TPOT-FE), an automated machine learning system that uses genetic programming (GP) to create ideal pipelines for a classification or regression task. These feature extractors enable TPOT-FE to build pipelines that can analyze non-tabular data, including text and images, which are increasingly common biomedical big data modalities that can contain rich quantities of information. We evaluate this approach on six image datasets and four text datasets, including three biomedical datasets, and show that TPOT-FE is able to consistently construct and optimize classification pipelines on all of the datasets.","PeriodicalId":422029,"journal":{"name":"Proceedings of the Companion Conference on Genetic and Evolutionary Computation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Extending Tree-Based Automated Machine Learning to Biomedical Image and Text Data Using Custom Feature Extractors\",\"authors\":\"Rachit Kumar, Joseph D. Romano, M. Ritchie, Jason Moore\",\"doi\":\"10.1145/3583133.3590584\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automated machine learning (AutoML) has allowed for many innovations in biomedical data science; however, most AutoML approaches do not support image or text data. To rectify this, we implemented four feature extractors in the Tree-based Pipeline Optimization Tool (TPOT) to make TPOT with Feature Extraction (TPOT-FE), an automated machine learning system that uses genetic programming (GP) to create ideal pipelines for a classification or regression task. These feature extractors enable TPOT-FE to build pipelines that can analyze non-tabular data, including text and images, which are increasingly common biomedical big data modalities that can contain rich quantities of information. We evaluate this approach on six image datasets and four text datasets, including three biomedical datasets, and show that TPOT-FE is able to consistently construct and optimize classification pipelines on all of the datasets.\",\"PeriodicalId\":422029,\"journal\":{\"name\":\"Proceedings of the Companion Conference on Genetic and Evolutionary Computation\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Companion Conference on Genetic and Evolutionary Computation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3583133.3590584\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Companion Conference on Genetic and Evolutionary Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3583133.3590584","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Extending Tree-Based Automated Machine Learning to Biomedical Image and Text Data Using Custom Feature Extractors
Automated machine learning (AutoML) has allowed for many innovations in biomedical data science; however, most AutoML approaches do not support image or text data. To rectify this, we implemented four feature extractors in the Tree-based Pipeline Optimization Tool (TPOT) to make TPOT with Feature Extraction (TPOT-FE), an automated machine learning system that uses genetic programming (GP) to create ideal pipelines for a classification or regression task. These feature extractors enable TPOT-FE to build pipelines that can analyze non-tabular data, including text and images, which are increasingly common biomedical big data modalities that can contain rich quantities of information. We evaluate this approach on six image datasets and four text datasets, including three biomedical datasets, and show that TPOT-FE is able to consistently construct and optimize classification pipelines on all of the datasets.