Visualising lead optimisation series using reduced graphs

IF 7.1 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY
Jessica Stacey, Baptiste Canault, Stephen D. Pickett, Valerie J. Gillet
{"title":"Visualising lead optimisation series using reduced graphs","authors":"Jessica Stacey,&nbsp;Baptiste Canault,&nbsp;Stephen D. Pickett,&nbsp;Valerie J. Gillet","doi":"10.1186/s13321-025-01002-7","DOIUrl":null,"url":null,"abstract":"<div><p>The typical way in which lead optimisation (LO) series are represented in the medicinal chemistry literature is as Markush structures and associated R-group tables. The Markush structure shows a central core or molecular scaffold that is common to the series with R groups that indicate the points of variability that have been explored in the series. The associated R-group table shows the substituent combinations that exist in individual molecules in the series together with properties of those compounds. This format provides an intuitive way of visualising any structure–activity relationship (SAR) that is present. Automated approaches that attempt to reproduce this well understood format, such as the SAR map, are based on maximum common substructure approaches and do not take account of small changes that may be made to the core structure itself or of the situation where more than one core exists in the data. Here we describe an automated approach to represent LO series that is based on reduced graph descriptions of molecules. A publicly available LO dataset from a drug discovery programme at GSK is analysed to show how the method can group together compounds from the same series even when there are small substructural differences within the core of the series while also being able to identify different related compound series. The resulting visualisation is useful in identifying areas where series are under explored and for mapping design ideas onto the current dataset. The code to generate the visualisations is released into the public domain to promote further research in this area.</p><p><b>Scientific contribution</b>: We describe a software tool for analysing lead optimisation series using reduced graph representations of molecules. The representation allows compounds that have similar but not identical chemical scaffolds to be grouped together and is, therefore, an advance on methods that are based on the more traditional Markush structure and SAR tables. The software is a useful addition to the med chem toolbox as it can provide a holistic view of lead optimisation data by representing what might otherwise be seen as separate series as a single series of compounds.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01002-7","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-025-01002-7","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

The typical way in which lead optimisation (LO) series are represented in the medicinal chemistry literature is as Markush structures and associated R-group tables. The Markush structure shows a central core or molecular scaffold that is common to the series with R groups that indicate the points of variability that have been explored in the series. The associated R-group table shows the substituent combinations that exist in individual molecules in the series together with properties of those compounds. This format provides an intuitive way of visualising any structure–activity relationship (SAR) that is present. Automated approaches that attempt to reproduce this well understood format, such as the SAR map, are based on maximum common substructure approaches and do not take account of small changes that may be made to the core structure itself or of the situation where more than one core exists in the data. Here we describe an automated approach to represent LO series that is based on reduced graph descriptions of molecules. A publicly available LO dataset from a drug discovery programme at GSK is analysed to show how the method can group together compounds from the same series even when there are small substructural differences within the core of the series while also being able to identify different related compound series. The resulting visualisation is useful in identifying areas where series are under explored and for mapping design ideas onto the current dataset. The code to generate the visualisations is released into the public domain to promote further research in this area.

Scientific contribution: We describe a software tool for analysing lead optimisation series using reduced graph representations of molecules. The representation allows compounds that have similar but not identical chemical scaffolds to be grouped together and is, therefore, an advance on methods that are based on the more traditional Markush structure and SAR tables. The software is a useful addition to the med chem toolbox as it can provide a holistic view of lead optimisation data by representing what might otherwise be seen as separate series as a single series of compounds.

可视化领先优化系列使用简化的图表
在药物化学文献中,铅优化(LO)系列的典型表示方式是马库什结构和相关的r -基团表。马库什结构显示了一个中心核心或分子支架,这是该系列共同的R组,表明该系列中已探索的变异性点。相关的r -基团表显示了该系列中存在于单个分子中的取代基组合以及这些化合物的性质。这种格式提供了一种直观的方式来显示任何存在的结构-活性关系(SAR)。试图复制这种易于理解的格式的自动化方法,如SAR图,是基于最大共同子结构方法,而不考虑可能对核心结构本身做出的微小变化或数据中存在多个核心的情况。在这里,我们描述了一种基于分子的简化图描述来表示LO系列的自动化方法。我们分析了来自GSK药物发现项目的公开可用的LO数据集,以展示该方法如何将来自同一系列的化合物组合在一起,即使该系列的核心内部存在微小的亚结构差异,同时还能够识别不同的相关化合物系列。由此产生的可视化对于识别有待探索的序列区域以及将设计思想映射到当前数据集非常有用。生成可视化的代码被发布到公共领域,以促进该领域的进一步研究。科学贡献:我们描述了一个软件工具,用于分析铅优化系列使用分子的简化图形表示。该表示允许将具有相似但不相同的化学支架的化合物分组在一起,因此是基于更传统的Markush结构和SAR表的方法的一种进步。该软件是医学化学工具箱中一个有用的补充,因为它可以通过将可能被视为单独系列的化合物表示为单个系列,从而提供先导优化数据的整体视图。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Cheminformatics
Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS
CiteScore
14.10
自引率
7.00%
发文量
82
审稿时长
3 months
期刊介绍: Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信