Network Analysis of the Organic Chemistry in Patents, Literature, and Pharmaceutical Industry.

IF 3.1 4区医学 Q3 CHEMISTRY, MEDICINAL

Molecular Informatics Pub Date : 2025-07-01 DOI:10.1002/minf.202500011

Emma Svensson, Emma Granqvist, Tomas Bastys, Christos Kannas, Mikhail Kabeshov, Samuel Genheden, Ola Engkvist, Thierry Kogej

{"title":"Network Analysis of the Organic Chemistry in Patents, Literature, and Pharmaceutical Industry.","authors":"Emma Svensson, Emma Granqvist, Tomas Bastys, Christos Kannas, Mikhail Kabeshov, Samuel Genheden, Ola Engkvist, Thierry Kogej","doi":"10.1002/minf.202500011","DOIUrl":null,"url":null,"abstract":"<p><p>Chemical reactions can be connected in large networks such as knowledge graphs. In this way, prior work has been able to draw meaningful conclusions about the properties and structures involved in organic chemistry reactions. However, the research has focused on public sources of organic synthesis that might lack the intricate details of the synthetic routes used in in-house drug discovery. In this work, previous analyses are expanded to also include an in-house electronic lab notebook (ELN) source, such that we can compare it to knowledge graphs that were constructed from US Patent and Trademark Office (USPTO) and Reaxys. We found that the Reaxys knowledge graph is the most interconnected and has the largest proportion of nodes belonging to the core, whereas the USPTO is much less connected and only has a small core. The ELN knowledge graph falls between these extremes in connectivity and it does not have any core. The hub molecules of ELN and USPTO are most similar, primarily represented by small, organic building blocks. We hypothesize that these differences can be attributed to the different origins of the data in the three sources. We discuss what impact this might have on synthesis prediction modelling.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 7","pages":"e202500011"},"PeriodicalIF":3.1000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12273192/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/minf.202500011","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}

引用次数: 0

Abstract

Chemical reactions can be connected in large networks such as knowledge graphs. In this way, prior work has been able to draw meaningful conclusions about the properties and structures involved in organic chemistry reactions. However, the research has focused on public sources of organic synthesis that might lack the intricate details of the synthetic routes used in in-house drug discovery. In this work, previous analyses are expanded to also include an in-house electronic lab notebook (ELN) source, such that we can compare it to knowledge graphs that were constructed from US Patent and Trademark Office (USPTO) and Reaxys. We found that the Reaxys knowledge graph is the most interconnected and has the largest proportion of nodes belonging to the core, whereas the USPTO is much less connected and only has a small core. The ELN knowledge graph falls between these extremes in connectivity and it does not have any core. The hub molecules of ELN and USPTO are most similar, primarily represented by small, organic building blocks. We hypothesize that these differences can be attributed to the different origins of the data in the three sources. We discuss what impact this might have on synthesis prediction modelling.

Abstract Image

查看原文本刊更多论文

有机化学在专利、文献和制药工业中的网络分析。

化学反应可以在像知识图谱这样的大网络中连接起来。通过这种方式，先前的工作已经能够得出有关有机化学反应的性质和结构的有意义的结论。然而，这项研究的重点是有机合成的公共来源，可能缺乏内部药物发现中使用的合成路线的复杂细节。在这项工作中，先前的分析被扩展到还包括内部电子实验室笔记本（ELN）源，这样我们就可以将其与美国专利商标局（USPTO）和Reaxys构建的知识图谱进行比较。我们发现Reaxys知识图谱的关联度最高，属于核心的节点比例最大，而USPTO的关联度要低得多，只有一个小核心。ELN知识图谱在连通性方面介于这两个极端之间，它没有任何核心。ELN和USPTO的中心分子最相似，主要由小的有机构建块表示。我们假设这些差异可以归因于三个来源的数据的不同来源。我们讨论了这可能对合成预测建模产生的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Molecular Informatics CHEMISTRY, MEDICINAL-MATHEMATICAL & COMPUTATIONAL BIOLOGY

CiteScore

7.30

自引率

2.80%

发文量

审稿时长

3 months

期刊介绍： Molecular Informatics is a peer-reviewed, international forum for publication of high-quality, interdisciplinary research on all molecular aspects of bio/cheminformatics and computer-assisted molecular design. Molecular Informatics succeeded QSAR & Combinatorial Science in 2010. Molecular Informatics presents methodological innovations that will lead to a deeper understanding of ligand-receptor interactions, macromolecular complexes, molecular networks, design concepts and processes that demonstrate how ideas and design concepts lead to molecules with a desired structure or function, preferably including experimental validation. The journal''s scope includes but is not limited to the fields of drug discovery and chemical biology, protein and nucleic acid engineering and design, the design of nanomolecular structures, strategies for modeling of macromolecular assemblies, molecular networks and systems, pharmaco- and chemogenomics, computer-assisted screening strategies, as well as novel technologies for the de novo design of biologically active molecules. As a unique feature Molecular Informatics publishes so-called "Methods Corner" review-type articles which feature important technological concepts and advances within the scope of the journal.