Machine Learning-Driven Discovery and Database of Cyanobacteria Bioactive Compounds: A Resource for Therapeutics and Bioremediation.

IF 5.6 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling Pub Date : 2024-11-27 DOI:10.1021/acs.jcim.4c00995

Renato Soares, Luísa Azevedo, Vitor Vasconcelos, Diogo Pratas, Sérgio F Sousa, João Carneiro

{"title":"Machine Learning-Driven Discovery and Database of Cyanobacteria Bioactive Compounds: A Resource for Therapeutics and Bioremediation.","authors":"Renato Soares, Luísa Azevedo, Vitor Vasconcelos, Diogo Pratas, Sérgio F Sousa, João Carneiro","doi":"10.1021/acs.jcim.4c00995","DOIUrl":null,"url":null,"abstract":"<p><p>Cyanobacteria strains have the potential to produce bioactive compounds that can be used in therapeutics and bioremediation. Therefore, compiling all information about these compounds to consider their value as bioresources for industrial and research applications is essential. In this study, a searchable, updated, curated, and downloadable database of cyanobacteria bioactive compounds was designed, along with a machine-learning model to predict the compounds' targets of newly discovered molecules. A Python programming protocol obtained 3431 cyanobacteria bioactive compounds, 373 unique protein targets, and 3027 molecular descriptors. PaDEL-descriptor, Mordred, and Drugtax software were used to calculate the chemical descriptors for each bioactive compound database record. The biochemical descriptors were then used to determine the most promising protein targets for human therapeutic approaches and environmental bioremediation using the best machine learning (ML) model. The creation of our database, coupled with the integration of computational docking protocols, represents an innovative approach to understanding the potential of cyanobacteria bioactive compounds. This resource, adhering to the findability, accessibility, interoperability, and reuse of digital assets (FAIR) principles, is an excellent tool for pharmaceutical and bioremediation researchers. Moreover, its capacity to facilitate the exploration of specific compounds' interactions with environmental pollutants is a significant advancement, aligning with the increasing reliance on data science and machine learning to address environmental challenges. This study is a notable step forward in leveraging cyanobacteria for both therapeutic and ecological sustainability.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6000,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.4c00995","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}

引用次数: 0

Abstract

Cyanobacteria strains have the potential to produce bioactive compounds that can be used in therapeutics and bioremediation. Therefore, compiling all information about these compounds to consider their value as bioresources for industrial and research applications is essential. In this study, a searchable, updated, curated, and downloadable database of cyanobacteria bioactive compounds was designed, along with a machine-learning model to predict the compounds' targets of newly discovered molecules. A Python programming protocol obtained 3431 cyanobacteria bioactive compounds, 373 unique protein targets, and 3027 molecular descriptors. PaDEL-descriptor, Mordred, and Drugtax software were used to calculate the chemical descriptors for each bioactive compound database record. The biochemical descriptors were then used to determine the most promising protein targets for human therapeutic approaches and environmental bioremediation using the best machine learning (ML) model. The creation of our database, coupled with the integration of computational docking protocols, represents an innovative approach to understanding the potential of cyanobacteria bioactive compounds. This resource, adhering to the findability, accessibility, interoperability, and reuse of digital assets (FAIR) principles, is an excellent tool for pharmaceutical and bioremediation researchers. Moreover, its capacity to facilitate the exploration of specific compounds' interactions with environmental pollutants is a significant advancement, aligning with the increasing reliance on data science and machine learning to address environmental challenges. This study is a notable step forward in leveraging cyanobacteria for both therapeutic and ecological sustainability.

查看原文本刊更多论文

机器学习驱动的蓝藻生物活性化合物的发现和数据库：治疗和生物修复资源。

蓝藻菌株有可能产生生物活性化合物，可用于治疗和生物修复。因此，汇编有关这些化合物的所有信息以考虑其作为生物资源在工业和研究应用中的价值至关重要。本研究设计了一个可搜索、更新、整理和下载的蓝藻生物活性化合物数据库，并设计了一个机器学习模型来预测新发现分子的化合物靶标。通过 Python 编程协议获得了 3431 种蓝藻生物活性化合物、373 个独特的蛋白质靶标和 3027 个分子描述符。使用 PaDEL-descriptor、Mordred 和 Drugtax 软件计算每个生物活性化合物数据库记录的化学描述符。然后，使用最佳机器学习（ML）模型，利用生化描述符确定最有希望用于人类治疗方法和环境生物修复的蛋白质靶标。我们数据库的创建以及计算对接协议的整合，代表了一种了解蓝藻生物活性化合物潜力的创新方法。该资源遵循数字资产的可查找性、可访问性、互操作性和可重用性（FAIR）原则，是制药和生物修复研究人员的绝佳工具。此外，它还能促进探索特定化合物与环境污染物之间的相互作用，这是一项重大进步，与日益依赖数据科学和机器学习来应对环境挑战的趋势相一致。这项研究在利用蓝藻实现治疗和生态可持续性方面迈出了值得注意的一步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Chemical Information and Modeling 化学-化学综合

CiteScore

9.80

自引率

10.70%

发文量

529

审稿时长

1.4 months

期刊介绍： The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.