Deepmol: an automated machine and deep learning framework for computational chemistry

IF 7.1 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY
João Correia, João Capela, Miguel Rocha
{"title":"Deepmol: an automated machine and deep learning framework for computational chemistry","authors":"João Correia,&nbsp;João Capela,&nbsp;Miguel Rocha","doi":"10.1186/s13321-024-00937-7","DOIUrl":null,"url":null,"abstract":"<div><p>The domain of computational chemistry has experienced a significant evolution due to the introduction of Machine Learning (ML) technologies. Despite its potential to revolutionize the field, researchers are often encumbered by obstacles, such as the complexity of selecting optimal algorithms, the automation of data pre-processing steps, the necessity for adaptive feature engineering, and the assurance of model performance consistency across different datasets. Addressing these issues head-on, <i>DeepMol</i> stands out as an Automated ML (AutoML) tool by automating critical steps of the ML pipeline. <i>DeepMol</i> rapidly and automatically identifies the most effective data representation, pre-processing methods and model configurations for a specific molecular property/activity prediction problem. On 22 benchmark datasets, <i>DeepMol</i> obtained competitive pipelines compared with those requiring time-consuming feature engineering, model design and selection processes. As one of the first AutoML tools specifically developed for the computational chemistry domain, <i>DeepMol</i> stands out with its open-source code, in-depth tutorials, detailed documentation, and examples of real-world applications, all available at https://github.com/BioSystemsUM/DeepMol and https://deepmol.readthedocs.io/en/latest/. By introducing AutoML as a groundbreaking feature in computational chemistry, DeepMol establishes itself as the pioneering state-of-the-art tool in the field.</p><p><b>Scientific contribution</b></p><p><i>DeepMol</i> aims to provide an integrated framework of AutoML for computational chemistry. <i>DeepMol</i> provides a more robust alternative to other tools with its integrated pipeline serialization, enabling seamless deployment using the <i>fit</i>, <i>transform</i>, and <i>predict</i> paradigms. It uniquely supports both conventional and deep learning models for regression, classification and multi-task, offering unmatched flexibility compared to other AutoML tools. <i>DeepMol's</i> predefined configurations and customizable objective functions make it accessible to users at all skill levels while enabling efficient and reproducible workflows. Benchmarking on diverse datasets demonstrated its ability to deliver optimized pipelines and superior performance across various molecular machine-learning tasks.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00937-7","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-024-00937-7","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

The domain of computational chemistry has experienced a significant evolution due to the introduction of Machine Learning (ML) technologies. Despite its potential to revolutionize the field, researchers are often encumbered by obstacles, such as the complexity of selecting optimal algorithms, the automation of data pre-processing steps, the necessity for adaptive feature engineering, and the assurance of model performance consistency across different datasets. Addressing these issues head-on, DeepMol stands out as an Automated ML (AutoML) tool by automating critical steps of the ML pipeline. DeepMol rapidly and automatically identifies the most effective data representation, pre-processing methods and model configurations for a specific molecular property/activity prediction problem. On 22 benchmark datasets, DeepMol obtained competitive pipelines compared with those requiring time-consuming feature engineering, model design and selection processes. As one of the first AutoML tools specifically developed for the computational chemistry domain, DeepMol stands out with its open-source code, in-depth tutorials, detailed documentation, and examples of real-world applications, all available at https://github.com/BioSystemsUM/DeepMol and https://deepmol.readthedocs.io/en/latest/. By introducing AutoML as a groundbreaking feature in computational chemistry, DeepMol establishes itself as the pioneering state-of-the-art tool in the field.

Scientific contribution

DeepMol aims to provide an integrated framework of AutoML for computational chemistry. DeepMol provides a more robust alternative to other tools with its integrated pipeline serialization, enabling seamless deployment using the fit, transform, and predict paradigms. It uniquely supports both conventional and deep learning models for regression, classification and multi-task, offering unmatched flexibility compared to other AutoML tools. DeepMol's predefined configurations and customizable objective functions make it accessible to users at all skill levels while enabling efficient and reproducible workflows. Benchmarking on diverse datasets demonstrated its ability to deliver optimized pipelines and superior performance across various molecular machine-learning tasks.

Deepmol:用于计算化学的自动化机器和深度学习框架
由于机器学习(ML)技术的引入,计算化学领域经历了重大的发展。尽管它有可能彻底改变该领域,但研究人员经常受到障碍的阻碍,例如选择最佳算法的复杂性,数据预处理步骤的自动化,自适应特征工程的必要性,以及不同数据集之间模型性能一致性的保证。为了正面解决这些问题,DeepMol通过自动化机器学习管道的关键步骤,作为一款自动化机器学习(AutoML)工具脱颖而出。DeepMol可以快速、自动地识别最有效的数据表示、预处理方法和模型配置,用于特定的分子性质/活性预测问题。与耗时的特征工程、模型设计和选择过程相比,DeepMol在22个基准数据集上获得了具有竞争力的管道。作为第一个专门为计算化学领域开发的AutoML工具之一,DeepMol以其开源代码,深入教程,详细文档和实际应用示例而脱颖而出,所有这些都可以在https://github.com/BioSystemsUM/DeepMol和https://deepmol.readthedocs.io/en/latest/上获得。通过引入AutoML作为计算化学领域的开创性功能,DeepMol将自己确立为该领域开创性的最先进工具。DeepMol旨在为计算化学提供一个集成的AutoML框架。DeepMol通过其集成的管道序列化提供了其他工具更强大的替代方案,可以使用fit、transform和predict范式实现无缝部署。它独特地支持传统和深度学习模型,用于回归、分类和多任务,与其他AutoML工具相比,提供了无与伦比的灵活性。DeepMol的预定义配置和可定制的目标函数使所有技能水平的用户都可以访问它,同时实现高效和可重复的工作流程。对不同数据集的基准测试表明,它能够在各种分子机器学习任务中提供优化的管道和卓越的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Cheminformatics
Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS
CiteScore
14.10
自引率
7.00%
发文量
82
审稿时长
3 months
期刊介绍: Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信