Context-dependent similarity searching for small molecular fragments

IF 5.7 2区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY

Journal of Cheminformatics Pub Date : 2025-05-26 DOI:10.1186/s13321-025-01032-1

Atsushi Yoshimori, Jürgen Bajorath

{"title":"Context-dependent similarity searching for small molecular fragments","authors":"Atsushi Yoshimori, Jürgen Bajorath","doi":"10.1186/s13321-025-01032-1","DOIUrl":null,"url":null,"abstract":"<div><p>Similarity searching is a mainstay in cheminformatics that is generally used to identify compounds with desired properties. For small molecular fragments, similarity calculations based on standard descriptors often have limited utility for establishing meaningful similarity relationships due to feature sparseness. As an alternative, we have adapted the concept of context-depending word pair similarity from natural language processing to evaluate similarity relationships between substituents (R-groups) taking latent characteristics into account. Context-dependent similarity assessment is based on vector embeddings as fragment representations generated using neural networks. With active analogue series as a model system to establish a global structure–activity context, we demonstrate that this approach is applicable to systematic similarity searching for substituents and increases the performance of standard descriptor representations. Context-dependent similarity searching is capable of detecting remote and functionally relevant similarity relationships between substituents. Alternative search queries are introduced focusing on individual substituents within a global substituent context or individual sequences of substituents establishing a local context. For similarity searching, different structural or structure–property contexts can be established, providing opportunities for various applications.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01032-1","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-025-01032-1","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Similarity searching is a mainstay in cheminformatics that is generally used to identify compounds with desired properties. For small molecular fragments, similarity calculations based on standard descriptors often have limited utility for establishing meaningful similarity relationships due to feature sparseness. As an alternative, we have adapted the concept of context-depending word pair similarity from natural language processing to evaluate similarity relationships between substituents (R-groups) taking latent characteristics into account. Context-dependent similarity assessment is based on vector embeddings as fragment representations generated using neural networks. With active analogue series as a model system to establish a global structure–activity context, we demonstrate that this approach is applicable to systematic similarity searching for substituents and increases the performance of standard descriptor representations. Context-dependent similarity searching is capable of detecting remote and functionally relevant similarity relationships between substituents. Alternative search queries are introduced focusing on individual substituents within a global substituent context or individual sequences of substituents establishing a local context. For similarity searching, different structural or structure–property contexts can be established, providing opportunities for various applications.

查看原文本刊更多论文

基于上下文的小分子片段相似性搜索

相似度搜索是化学信息学的主要内容，通常用于识别具有所需性质的化合物。对于小分子片段，由于特征稀疏性，基于标准描述符的相似性计算对于建立有意义的相似性关系的效用通常有限。作为替代方案，我们采用了自然语言处理中上下文相关词对相似性的概念，以评估考虑潜在特征的取代基（r -基团）之间的相似性关系。上下文相关的相似性评估基于向量嵌入作为使用神经网络生成的片段表示。利用主动模拟序列作为模型系统来建立全局结构-活性上下文，我们证明了该方法适用于取代基的系统相似性搜索，并提高了标准描述符表示的性能。上下文相关的相似性搜索能够检测取代基之间的远程和功能相关的相似性关系。引入了替代搜索查询，重点关注全局取代基上下文中的单个取代基或建立局部上下文的取代基的单个序列。对于相似性搜索，可以建立不同的结构或结构属性上下文，为各种应用提供了机会。在此之前，我们引入了上下文相关的相似性评估来进行模拟序列比对。该方法基于自然语言处理中词的上下文相关相似性的概念。在此，将该方法扩展到小分子片段的相似性搜索。上下文相关的相似性搜索将潜在的片段特征考虑在内，代表了一种新的化学相似性评估方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

14.10

自引率

7.00%

发文量

审稿时长

3 months

期刊介绍： Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.