Ruite Xiang, Christian Domínguez-Dalmases, Albert Cañellas-Solé, Victor Guallar
{"title":"aMLProt: an automated machine learning library for protein applications.","authors":"Ruite Xiang, Christian Domínguez-Dalmases, Albert Cañellas-Solé, Victor Guallar","doi":"10.1093/bioinformatics/btaf543","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Machine learning tools have become increasingly common in biological research, driven by the emergence of pre-trained large language models. However, training effective models remains a complex task, since many choices influence their performance. AutoML (automated machine learning) approaches help address these challenges by streamlining the entire model development pipeline.</p><p><strong>Results: </strong>We developed aMLProt, an AutoML framework tailored specifically for protein applications, such as enzyme engineering and bioprospecting. It features a modular design, allowing each component to be used independently or in combination. Notably, aMLProt integrates 19 classifiers and 26 regressors, along with pre-trained protein language models. It also includes standalone applications proven useful for protein-related workflows. To enhance usability, aMLProt is integrated with Horus, a GUI-based application with a visual interface.</p><p><strong>Availability: </strong>aMLProt is available on https://github.com/etiur/aMLProt.git and https://doi.org/10.5281/zenodo.14971157; The aMLProt plugin is available via the official Horus Plugin Repository https://horus.bsc.es/repo/plugins/amlprot, and Horus itself can be freely downloaded from https://horus.bsc.es. Moreover, a demo of aMLProt can be found, without previous registration or download, at the horus.bsc.es/amlprot and horus.bsc.es/amlprot-suggest. The results and data from the pH optima regression model are available at: https://zenodo.org/records/15394097.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf543","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Motivation: Machine learning tools have become increasingly common in biological research, driven by the emergence of pre-trained large language models. However, training effective models remains a complex task, since many choices influence their performance. AutoML (automated machine learning) approaches help address these challenges by streamlining the entire model development pipeline.
Results: We developed aMLProt, an AutoML framework tailored specifically for protein applications, such as enzyme engineering and bioprospecting. It features a modular design, allowing each component to be used independently or in combination. Notably, aMLProt integrates 19 classifiers and 26 regressors, along with pre-trained protein language models. It also includes standalone applications proven useful for protein-related workflows. To enhance usability, aMLProt is integrated with Horus, a GUI-based application with a visual interface.
Availability: aMLProt is available on https://github.com/etiur/aMLProt.git and https://doi.org/10.5281/zenodo.14971157; The aMLProt plugin is available via the official Horus Plugin Repository https://horus.bsc.es/repo/plugins/amlprot, and Horus itself can be freely downloaded from https://horus.bsc.es. Moreover, a demo of aMLProt can be found, without previous registration or download, at the horus.bsc.es/amlprot and horus.bsc.es/amlprot-suggest. The results and data from the pH optima regression model are available at: https://zenodo.org/records/15394097.
Supplementary information: Supplementary data are available at Bioinformatics online.