Hengwei Chen, Atsushi Yoshimori and Jürgen Bajorath
{"title":"利用基于双向转换器的化学语言模型,扩展多位点类似物系列的强效化合物","authors":"Hengwei Chen, Atsushi Yoshimori and Jürgen Bajorath","doi":"10.1039/D4MD00423J","DOIUrl":null,"url":null,"abstract":"<p >Generating potent compounds for evolving analogue series (AS) is a key challenge in medicinal chemistry. The versatility of chemical language models (CLMs) makes it possible to formulate this challenge as an off-the-beaten-path prediction task. In this work, we have devised a coding and tokenization scheme for evolving AS with multiple substitution sites (multi-site AS) and implemented a bidirectional transformer to predict new potent analogues for such series. Scientific foundations of this approach are discussed and, as a benchmark, the transformer model is compared to a recurrent neural network (RNN) for the prediction of analogues of AS with single substitution sites. Furthermore, the transformer is shown to successfully predict potent analogues with varying R-group combinations for multi-site AS having activity against many different targets. Prediction of R-group combinations for extending AS with potent compounds represents a novel approach for compound optimization.</p>","PeriodicalId":88,"journal":{"name":"MedChemComm","volume":" 7","pages":" 2527-2537"},"PeriodicalIF":3.5970,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Extension of multi-site analogue series with potent compounds using a bidirectional transformer-based chemical language model\",\"authors\":\"Hengwei Chen, Atsushi Yoshimori and Jürgen Bajorath\",\"doi\":\"10.1039/D4MD00423J\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Generating potent compounds for evolving analogue series (AS) is a key challenge in medicinal chemistry. The versatility of chemical language models (CLMs) makes it possible to formulate this challenge as an off-the-beaten-path prediction task. In this work, we have devised a coding and tokenization scheme for evolving AS with multiple substitution sites (multi-site AS) and implemented a bidirectional transformer to predict new potent analogues for such series. Scientific foundations of this approach are discussed and, as a benchmark, the transformer model is compared to a recurrent neural network (RNN) for the prediction of analogues of AS with single substitution sites. Furthermore, the transformer is shown to successfully predict potent analogues with varying R-group combinations for multi-site AS having activity against many different targets. Prediction of R-group combinations for extending AS with potent compounds represents a novel approach for compound optimization.</p>\",\"PeriodicalId\":88,\"journal\":{\"name\":\"MedChemComm\",\"volume\":\" 7\",\"pages\":\" 2527-2537\"},\"PeriodicalIF\":3.5970,\"publicationDate\":\"2024-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"MedChemComm\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://pubs.rsc.org/en/content/articlelanding/2024/md/d4md00423j\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Pharmacology, Toxicology and Pharmaceutics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"MedChemComm","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2024/md/d4md00423j","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Pharmacology, Toxicology and Pharmaceutics","Score":null,"Total":0}
引用次数: 0
摘要
为不断演化的类似物系列(AS)生成强效化合物是药物化学领域的一项关键挑战。化学语言模型(CLM)的多功能性使我们有可能将这一挑战制定为非主流预测任务。在这项工作中,我们为具有多个取代位点(多位点 AS)的 AS 演化设计了一种编码和标记化方案,并实施了一种双向转换器来预测此类系列的新的强效类似物。我们讨论了这种方法的科学基础,并将转换器模型与预测单取代位点 AS 类似物的递归神经网络(RNN)进行了比较。此外,研究还表明转化器能成功预测具有不同 R 组组合的强效类似物,这些类似物是针对许多不同靶点具有活性的多位点 AS。预测 R 基团组合以扩展 AS 的强效化合物是化合物优化的一种新方法。
Extension of multi-site analogue series with potent compounds using a bidirectional transformer-based chemical language model
Generating potent compounds for evolving analogue series (AS) is a key challenge in medicinal chemistry. The versatility of chemical language models (CLMs) makes it possible to formulate this challenge as an off-the-beaten-path prediction task. In this work, we have devised a coding and tokenization scheme for evolving AS with multiple substitution sites (multi-site AS) and implemented a bidirectional transformer to predict new potent analogues for such series. Scientific foundations of this approach are discussed and, as a benchmark, the transformer model is compared to a recurrent neural network (RNN) for the prediction of analogues of AS with single substitution sites. Furthermore, the transformer is shown to successfully predict potent analogues with varying R-group combinations for multi-site AS having activity against many different targets. Prediction of R-group combinations for extending AS with potent compounds represents a novel approach for compound optimization.
期刊介绍:
Research and review articles in medicinal chemistry and related drug discovery science; the official journal of the European Federation for Medicinal Chemistry.
In 2020, MedChemComm will change its name to RSC Medicinal Chemistry. Issue 12, 2019 will be the last issue as MedChemComm.