Evan Komp, Kristoffer E Johansson, Nicholas P Gauthier, Japheth E Gado, Kresten Lindorff-Larsen, Gregg T Beckham
{"title":"Accessible, uniform protein property prediction with a scikit-learn based toolset AIDE.","authors":"Evan Komp, Kristoffer E Johansson, Nicholas P Gauthier, Japheth E Gado, Kresten Lindorff-Larsen, Gregg T Beckham","doi":"10.1093/bioinformatics/btaf544","DOIUrl":null,"url":null,"abstract":"<p><strong>Summary: </strong>Protein property prediction via machine learning with and without labeled data is becoming increasingly powerful, yet methods are disparate and capabilities vary widely over applications. The software presented here, \"Artificial Intelligence Driven protein Estimation (AIDE),\" enables instantiating, optimizing, and testing many zero-shot and supervised property prediction methods for variants and variable length homologs in a single, reproducible notebook or script by defining a modular, standardized application programming interface (API) that is drop-in compatible with scikit-learn transformers and pipelines.</p><p><strong>Availability and implementation: </strong>AIDE is an installable, importable python package inheriting from scikit-learn classes and API and is installable on Windows, Mac, and Linux. Many of the wrapped models internal to AIDE will be effectively inaccessible without a GPU, and some assume CUDA. The newest stable, tested version can be found at https://github.com/beckham-lab/aide_predict and a full user guide and API reference can be found at https://beckham-lab.github.io/aide_predict/. Static versions of both at the time of writing can be found on Zenodo. (Komp and Beckham 2025).</p><p><strong>Supplementary information: </strong>Digital supplementary data contains API examples and a user guide. Appendix A and B provide PDFs of notebooks for showcases. Source data for figures are provided.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf544","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Summary: Protein property prediction via machine learning with and without labeled data is becoming increasingly powerful, yet methods are disparate and capabilities vary widely over applications. The software presented here, "Artificial Intelligence Driven protein Estimation (AIDE)," enables instantiating, optimizing, and testing many zero-shot and supervised property prediction methods for variants and variable length homologs in a single, reproducible notebook or script by defining a modular, standardized application programming interface (API) that is drop-in compatible with scikit-learn transformers and pipelines.
Availability and implementation: AIDE is an installable, importable python package inheriting from scikit-learn classes and API and is installable on Windows, Mac, and Linux. Many of the wrapped models internal to AIDE will be effectively inaccessible without a GPU, and some assume CUDA. The newest stable, tested version can be found at https://github.com/beckham-lab/aide_predict and a full user guide and API reference can be found at https://beckham-lab.github.io/aide_predict/. Static versions of both at the time of writing can be found on Zenodo. (Komp and Beckham 2025).
Supplementary information: Digital supplementary data contains API examples and a user guide. Appendix A and B provide PDFs of notebooks for showcases. Source data for figures are provided.
摘要:通过机器学习进行蛋白质性质预测,无论有无标记数据,都变得越来越强大,但方法不同,能力在应用中差异很大。本文介绍的软件“人工智能驱动的蛋白质估计(AIDE)”,通过定义与scikit-learn变压器和管道兼容的模块化、标准化应用程序编程接口(API),可以在单个可复制的笔记本或脚本中实例化、优化和测试许多变量和可变长度同源物的零采样和监督属性预测方法。可用性和实现:AIDE是一个可安装的、可导入的python包,继承了scikit-learn类和API,可以安装在Windows、Mac和Linux上。AIDE内部的许多封装模型在没有GPU的情况下将无法有效访问,有些假设是CUDA。最新的稳定测试版本可以在https://github.com/beckham-lab/aide_predict上找到,完整的用户指南和API参考可以在https://beckham-lab.github.io/aide_predict/上找到。在撰写本文时,可以在Zenodo上找到两者的静态版本。(Komp and Beckham 2025)。补充信息:数字补充数据包含API示例和用户指南。附录A及B为展览提供笔记本的pdf格式。提供了数字的源数据。