Yin-yang in drug discovery: rethinking de novo design and development of predictive models

Frontiers in drug discovery Pub Date : 2023-06-21 DOI:10.3389/fddsv.2023.1222655

Ana L. Chávez‐Hernández, E. López-López, J. Medina‐Franco

{"title":"Yin-yang in drug discovery: rethinking de novo design and development of predictive models","authors":"Ana L. Chávez‐Hernández, E. López-López, J. Medina‐Franco","doi":"10.3389/fddsv.2023.1222655","DOIUrl":null,"url":null,"abstract":"Chemical and biological data are the cornerstone of modern drug discovery programs. Finding qualitative yet better quantitative relationships between chemical structures and biological activity has been long pursued in medicinal chemistry and drug discovery. With the rapid increase and deployment of the predictive machine and deep learning methods, as well as the renewed interest in the de novo design of compound libraries to enlarge the medicinally relevant chemical space, the balance between quantity and quality of data are becoming a central point in the discussion of the type of data sets needed. Although there is a general notion that the more data, the better, it is also true that its quality is crucial despite the size of the data itself. Furthermore, the active versus inactive compounds ratio balance is also a major consideration. This review discusses the most common public data sets currently used as benchmarks to develop predictive and classification models used in de novo design. We point out the need to continue disclosing inactive compounds and negative data in peer-reviewed publications and public repositories and promote the balance between the positive (Yang) and negative (Yin) bioactivity data. We emphasize the importance of reconsidering drug discovery initiatives regarding both the utilization and classification of data.","PeriodicalId":73080,"journal":{"name":"Frontiers in drug discovery","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in drug discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fddsv.2023.1222655","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Chemical and biological data are the cornerstone of modern drug discovery programs. Finding qualitative yet better quantitative relationships between chemical structures and biological activity has been long pursued in medicinal chemistry and drug discovery. With the rapid increase and deployment of the predictive machine and deep learning methods, as well as the renewed interest in the de novo design of compound libraries to enlarge the medicinally relevant chemical space, the balance between quantity and quality of data are becoming a central point in the discussion of the type of data sets needed. Although there is a general notion that the more data, the better, it is also true that its quality is crucial despite the size of the data itself. Furthermore, the active versus inactive compounds ratio balance is also a major consideration. This review discusses the most common public data sets currently used as benchmarks to develop predictive and classification models used in de novo design. We point out the need to continue disclosing inactive compounds and negative data in peer-reviewed publications and public repositories and promote the balance between the positive (Yang) and negative (Yin) bioactivity data. We emphasize the importance of reconsidering drug discovery initiatives regarding both the utilization and classification of data.

查看原文本刊更多论文

药物发现中的阴阳：重新思考预测模型的从头设计和开发

化学和生物数据是现代药物发现计划的基石。在药物化学和药物发现中，寻找化学结构和生物活性之间的定性和更好的定量关系一直是人们追求的目标。随着预测机器和深度学习方法的快速增加和部署，以及对化合物库重新设计的兴趣，以扩大与医学相关的化学空间，数据数量和质量之间的平衡正在成为讨论所需数据集类型的中心点。虽然人们普遍认为数据越多越好，但数据的质量也至关重要，尽管数据本身有多大。此外，活性与非活性化合物的比例平衡也是一个主要考虑因素。这篇综述讨论了目前最常见的公共数据集，这些数据集被用作开发用于从头设计的预测和分类模型的基准。我们指出有必要继续在同行评审出版物和公共数据库中披露非活性化合物和负面数据，并促进正面(阳)和负面(阴)生物活性数据之间的平衡。我们强调在数据利用和分类方面重新考虑药物发现倡议的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Frontiers in drug discovery

自引率

0.00%

发文量