Molecular Informatics最新文献

筛选
英文 中文
Transformer Learning in Sequence-Based Drug Design Depends on Compound Memorization and Similarity of Sequence-Compound Pairs. 基于序列的药物设计中的变形学习依赖于化合物记忆和序列-化合物对的相似性。
IF 3.1 4区 医学
Molecular Informatics Pub Date : 2026-01-01 DOI: 10.1002/minf.70016
Jürgen Bajorath
{"title":"Transformer Learning in Sequence-Based Drug Design Depends on Compound Memorization and Similarity of Sequence-Compound Pairs.","authors":"Jürgen Bajorath","doi":"10.1002/minf.70016","DOIUrl":"10.1002/minf.70016","url":null,"abstract":"<p><p>Chemical language models (CLMs), particularly encoder-decoder transformers, have advanced generative molecular design. Transformer CLMs are able to learn a variety of molecular mappings for compound design that can be conditioned using context-dependent rules. However, their black-box nature complicates the interpretation of predictions. Current analysis methods mostly focus on attention weights of token relationships or attention flow in encoder and decoder modules and cannot explain predictions at the molecular level. Sequence-based compound design was used as a model system to investigate transformer learning characteristics through systematic control calculations involving modifications of protein sequences and sequence-compound pairs. The analysis revealed that compound reproducibility depended on similarity relationships between training and test data and on compound memorization, while specific sequence information was not learned. These findings indicate that predictions of transformer CLMs are driven by memorization effects and statistical correlations rather than by learning specific chemical or biological information. Understanding this learning behavior aids in avoiding over-interpretation of model outputs and informs the appropriate application of transformer-based CLMs in molecular design.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"45 1","pages":"e70016"},"PeriodicalIF":3.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12782052/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145934112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structure-Activity Relationships and Design of Focused Libraries Tailored for Staphylococcus Aureus Inhibition. 金黄色葡萄球菌抑制特异性文库的构效关系与设计
IF 3.1 4区 医学
Molecular Informatics Pub Date : 2025-11-01 DOI: 10.1002/minf.70015
Alberto Marbán-González, José L Medina-Franco
{"title":"Structure-Activity Relationships and Design of Focused Libraries Tailored for Staphylococcus Aureus Inhibition.","authors":"Alberto Marbán-González, José L Medina-Franco","doi":"10.1002/minf.70015","DOIUrl":"10.1002/minf.70015","url":null,"abstract":"<p><p>Staphylococcus aureus is a bacterium classified among the ESKAPE pathogens, which are anticipated to pose a significant global health emergency in the coming decades. The FabI enzyme, present in both Gram-positive and Gram-negative bacteria, is a key enzyme involved in fatty acid synthesis II (FAS-II). In this study, we utilized transformation rules to expand the chemical space from the most potent S. aureus FabI inhibitors. Three newly generated focused libraries, named INDDS, DIADS, and PYRDS, encompassed 172,026 compounds. These compounds were ranked based on structural similarity and predicted pIC<sub>50</sub> values obtained from machine learning models. This approach allowed to prioritize compounds in each focused library targeting S. aureus FabI. We analyzed the pharmacological properties and chemical space diversity of the S. aureus FabI inhibitors to gather relevant insights and support the prioritization of compounds for further study. The three newly generated libraries are freely available at https://github.com/DIFACQUIM/S.aureus_inhibitors.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 11-12","pages":"e70015"},"PeriodicalIF":3.1,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12694758/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145724727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Update and ADMET Profile of the Latin American Natural Product Database: LANaPDB. 拉丁美洲天然产品数据库:LANaPDB的更新和ADMET简介。
IF 3.1 4区 医学
Molecular Informatics Pub Date : 2025-11-01 DOI: 10.1002/minf.70013
Alejandro Gómez-García, Martin J Lavecchia, Dionisio A Olmedo, Pablo N Solís, José L Medina-Franco
{"title":"Update and ADMET Profile of the Latin American Natural Product Database: LANaPDB.","authors":"Alejandro Gómez-García, Martin J Lavecchia, Dionisio A Olmedo, Pablo N Solís, José L Medina-Franco","doi":"10.1002/minf.70013","DOIUrl":"10.1002/minf.70013","url":null,"abstract":"<p><p>For more than 5 years, several countries in Latin America have been developing and updating compound databases of natural products (NPs) isolated and characterized by their countries. In parallel, multiple research groups have been collaborating and assembling a unified Latin American Natural Product Database (LANaPDB), an open-access compound collection representative of Latin America that stands out as a geographical region distinct from its vastness and richness of NP resources. Herein, we report a significant update of LANaPDB, which gathers NPs from eight countries. Major updates to the database include adding 1,164 new compounds obtained from NaturAr, a NP collection from Argentina published in 2025, and 132 new compounds from Panama. The updated LANaPDB has 14,742 nonduplicate compounds. Moreover, a comprehensive evaluation of 41 ADMET (absorption, distribution, metabolism, excretion, and toxicity)-related parameters was carried out for LANaPDB, and the results were compared with one of the largest NP databases, the Universal Natural Product Database, and the approved small-molecule drugs. The results indicated that the three databases have a very similar ADMET profile. Besides, most of the LANaPDB compounds presented high bioavailability, volume of distribution, plasma protein binding rate, blood-brain barrier penetration, susceptibility to CYP3A4, and half-life less than 12 h. Moreover, most of the LANaPDB compounds were predicted with a low probability of inducing toxicity-related reactions. The third version of LANaPDB and the codes for the curation and determination of 41 ADMET-related parameters are freely available at https://doi.org/10.5281/zenodo.15595030. The code is general and can be used to analyze other compound libraries.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 11-12","pages":"e70013"},"PeriodicalIF":3.1,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12694011/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145715146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploration of (Ultra)Big Chemical Spaces. (超)大化学空间的探索。
IF 3.1 4区 医学
Molecular Informatics Pub Date : 2025-10-01 DOI: 10.1002/minf.70012
José L Medina-Franco
{"title":"Exploration of (Ultra)Big Chemical Spaces.","authors":"José L Medina-Franco","doi":"10.1002/minf.70012","DOIUrl":"https://doi.org/10.1002/minf.70012","url":null,"abstract":"","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 10","pages":"e70012"},"PeriodicalIF":3.1,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145588154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ligand B-Factor Index: A Metric for Prioritizing Protein-Ligand Complexes in Docking. 配体b因子指数:蛋白质-配体复合物在对接中优先排序的度量。
IF 3.1 4区 医学
Molecular Informatics Pub Date : 2025-09-01 DOI: 10.1002/minf.70010
Liliana Halip, Cristian Neanu, Sorin Avram
{"title":"Ligand B-Factor Index: A Metric for Prioritizing Protein-Ligand Complexes in Docking.","authors":"Liliana Halip, Cristian Neanu, Sorin Avram","doi":"10.1002/minf.70010","DOIUrl":"10.1002/minf.70010","url":null,"abstract":"<p><p>Docking is a structure-based cheminformatics tool broadly employed in early drug discovery. Based on the tridimensional structure of the protein target, docking is used to predict the binding interactions between the protein and a ligand, estimate the corresponding binding affinity, or perform virtual screenings (VSs) to identify new active compounds. This study introduces the ligand B-factor index (LBI), a novel computational metric for prioritizing protein-ligand complexes for docking. Unlike other metrics, LBI directly compares atomic displacements in the ligand and binding site. LBI is defined as the ratio of the median atomic B-factor of the binding site to that of the bound ligand. Using the comparative assessment of scoring functions (CASF-2016) dataset, we evaluated the effectiveness of LBI in guiding the selection of protein-ligand complexes to enhance docking performance. Our results show a moderate correlation (Spearman ρ ~ 0.48) between LBI and the experimental binding affinities, outperforming several docking scoring functions. Additionally, LBI correlates with improved redocking success (root mean square deviation < 2 Å), underlying the significance of a ligand-focused metric. While LBI outperforms other metrics such as the protein B-factor index and resolution, its utility in VS docking remains to be further investigated. LBI is easy to compute, interpretable, applicable in structure-based cheminformatics, and freely available for calculation at https://chembioinf.ro/tool-bi-computing.html.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 9","pages":"e202500127"},"PeriodicalIF":3.1,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12423484/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145033654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine Learning-Based Identification of Petroleum Distillates and Gasoline Traces Using Measured and Synthetic GC Spectra from Collected Samples. 基于机器学习的石油馏分和汽油痕迹的识别,使用从收集的样品中测量和合成GC光谱。
IF 3.1 4区 医学
Molecular Informatics Pub Date : 2025-08-01 DOI: 10.1002/minf.70008
Omer Kaspi, Yaniv Y Avissar, Arnon Grafit, Ron Chibel, Olga Girshevitz, Hanoch Senderowitz
{"title":"Machine Learning-Based Identification of Petroleum Distillates and Gasoline Traces Using Measured and Synthetic GC Spectra from Collected Samples.","authors":"Omer Kaspi, Yaniv Y Avissar, Arnon Grafit, Ron Chibel, Olga Girshevitz, Hanoch Senderowitz","doi":"10.1002/minf.70008","DOIUrl":"https://doi.org/10.1002/minf.70008","url":null,"abstract":"<p><p>Ignition cases involving arsons are typically handled by forensic experts who examine spectra of samples collected from scenes of fire to test for the existence or absence of ignitable liquids. This is tedious work, since many cases do not involve such liquids. To facilitate this process, we have developed in this work a Machine Learning (ML)-based workflow for samples' classification based on their gas chromatography (GC) chromatograms (i.e., spectra). To this end, annotated spectra of 181 samples containing three groups of liquids (petroleum distillates, gasoline, and an assortment of other substances) collected from fire scenes as well as two reference databases were obtained from the Israeli Department of Identification and Forensic Sciences (DIFS). These spectra were used for the derivation of ML-based classification models using three algorithms, namely, kNN, representative spectrum, and random forest (RF) giving rise to reliable predictions. To increase the size of the dataset to a level that would enable the usage of more advanced ML algorithms, we have used the experimental spectra to develop a new spectra synthesis algorithm and utilized it to generate a large dataset of synthetic spectra. This dataset was used for the derivation of new kNN, RF, and representative spectrum models as well as deep learning (DL) models producing F1-scores over an independent test set composed entirely of \"real\" spectra ranging from 0.74-0.95, 0.86-0.95, 0.30-0.75, and 0.85-0.96 for kNN, RF, representative spectrum, and DL, respectively. Following the completion of the work, a second set of real spectra was provided to us by DIFS, and modeling it with the second set of models yielded F1-scores ranging from 0.92-0.96, 0.96-1.00, 0.71-0.82, and 0.95-0.98 for kNN, RF, representative spectrum, and DL, respectively. These results therefore suggest that for this dataset, performances depend more on the size of the dataset used for model training than on the ML algorithm. We propose that the workflow and spectra synthesis algorithm developed in this work could be readily applied to other forensic domains where samples are characterized by spectra, either solely or in combination with other parameters.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 8","pages":"e202400371"},"PeriodicalIF":3.1,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12371388/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144961933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating Generative Pretrained Transformer and Genetic Algorithms for Efficient and Diverse Molecular Generation. 集成生成预训练变压器和遗传算法的高效和多样化分子生成。
IF 3.1 4区 医学
Molecular Informatics Pub Date : 2025-08-01 DOI: 10.1002/minf.70005
Chengcheng Xu, Chen Zeng, Xi Yang, Yingxu Liu, Xiangzhen Ning, Lidan Zheng, Yang Liu, Qing Fan, Chao Xu, Haichun Liu, Xian Wei, Yadong Chen, Yanmin Zhang, Rui Gu
{"title":"Integrating Generative Pretrained Transformer and Genetic Algorithms for Efficient and Diverse Molecular Generation.","authors":"Chengcheng Xu, Chen Zeng, Xi Yang, Yingxu Liu, Xiangzhen Ning, Lidan Zheng, Yang Liu, Qing Fan, Chao Xu, Haichun Liu, Xian Wei, Yadong Chen, Yanmin Zhang, Rui Gu","doi":"10.1002/minf.70005","DOIUrl":"https://doi.org/10.1002/minf.70005","url":null,"abstract":"<p><p>In computer-aided drug design, molecular generation models play a crucial role in accelerating the drug development process. Current models mainly fall into two categories: deep learning models with high performance but poor interpretability and heuristic algorithms with better interpretability but limited performance. In this study, we introduce an innovative molecular generation model, the compound construction model (CCMol), which integrates the powerful generative capabilities of the generative pretrained transformer (GPT) and the efficient optimization mechanisms of genetic algorithms (GA) to achieve effective and innovative molecular structures. Specifically, our approach uses structure-based drug design comprising both ligand and protein primary structure-based aspects. CCMol integrates GPT for initial molecular generation and GA for iterative optimization of physicochemical and biological properties. The model's reliability was validated by generating molecules targeting three critical disease-related proteins (GLP1, WRN, and JAK2). The results indicate that CCMol is on average with current advanced models in multiple indicators and performs better than the baseline model in terms of structure diversity and drug-related properties indicators, demonstrating that CCMol exhibits outstanding performance in developing novel and effective candidate drug molecules, particularly suitable for expanding the chemical validity of candidate structures at the early stages of drug discovery.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 8","pages":"e202500094"},"PeriodicalIF":3.1,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144784859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LiProS: Findable, Accessible, Interoperable, and Reusable Data Simulation Workflow to Predict Accurate Lipophilicity Profiles for Small Molecules. LiProS:可查找,可访问,可互操作和可重用的数据模拟工作流,以预测小分子的准确亲脂性概况。
IF 3.1 4区 医学
Molecular Informatics Pub Date : 2025-08-01 DOI: 10.1002/minf.70007
Esteban Bertsch-Aguilar, Antonio Piedra, Daniel Acuña, Sebastián Suñer, Sylvana Pinheiro, William J Zamora
{"title":"LiProS: Findable, Accessible, Interoperable, and Reusable Data Simulation Workflow to Predict Accurate Lipophilicity Profiles for Small Molecules.","authors":"Esteban Bertsch-Aguilar, Antonio Piedra, Daniel Acuña, Sebastián Suñer, Sylvana Pinheiro, William J Zamora","doi":"10.1002/minf.70007","DOIUrl":"https://doi.org/10.1002/minf.70007","url":null,"abstract":"<p><p>Lipophilicity is a fundamental physicochemical property widely used to evaluate key parameters in drug design, materials science, and food engineering. It plays a critical role in predicting membrane permeability, absorption, and distribution of compounds. Moreover, lipophilicity is commonly integrated into scoring functions to model biomolecular interactions and serves as an important molecular descriptor in machine learning models for property prediction and compound classification. The election of the appropriate pH-dependent lipophilicity ( <math> <semantics><mrow><mi>log</mi> <msub><mi>D</mi> <mrow><mtext>pH</mtext></mrow> </msub> </mrow> <annotation>$$ mathrm{log} {D}_{pH} $$</annotation></semantics> </math> ) model is important to ensure its accuracy. The incorporation of the ion apparent partition coefficient ( <math> <semantics> <mrow><msubsup><mi>P</mi> <mi>I</mi> <mtext>app</mtext></msubsup> </mrow> <annotation>$$ {P}_{text{I}}^{text{app}}$$</annotation></semantics> </math> ) into predictions of pH-dependent lipophilicity profiles can be essential for accurately reproducing experimental results. In accordance with the principles for findable, accessible, interoperable, and reusable data to improve data management and sharing, here, we introduce LiProS, a FAIR workflow that is easily accessible through a Google Colab notebook. LiProS assists researchers in efficiently determining the appropriate pH-dependent lipophilicity profile based on the SMILES code of their molecules of interest. In addition, LiProS demonstrated its utility in the analysis of ionizable compounds within the NAPRORE-CR natural products database, enabling the identification of the most appropriate lipophilicity formalism tailored to the physicochemical characteristics of these compounds.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 8","pages":"e202500136"},"PeriodicalIF":3.1,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144962005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structural Flexibility and Shape Similarity Contribute to Exclusive Functions of Certain Atg8 Isoforms in the Autophagy Process. 结构灵活性和形状相似性有助于某些at8亚型在自噬过程中的排他性功能。
IF 2.8 4区 医学
Molecular Informatics Pub Date : 2025-07-01 DOI: 10.1002/minf.70004
Alexey Rayevsky, Eliah Bulgakov, Mariia Stykhylias, Sergey Ozheredov, Svetlana Spivak, Yaroslav Blume
{"title":"Structural Flexibility and Shape Similarity Contribute to Exclusive Functions of Certain Atg8 Isoforms in the Autophagy Process.","authors":"Alexey Rayevsky, Eliah Bulgakov, Mariia Stykhylias, Sergey Ozheredov, Svetlana Spivak, Yaroslav Blume","doi":"10.1002/minf.70004","DOIUrl":"https://doi.org/10.1002/minf.70004","url":null,"abstract":"<p><p>Despite the abundance of systematically collected experimental data and facts, the multistep process of autophagy still contains many dark spots. One concerns the background selectivity of interactions between certain autophagy-related protein (ATG8) isoforms and their receptors/adaptors in plants during the autophagy process. By regulating phagophore initiation, expansion, and maturation, these proteins control the assembly of numerous autophagy proteins at this key docking platform. Bioinformatics analysis of human, yeast, and plant ATG8 amino acid sequences allow us to build a sequence tree of plant ATG8s, divided in three groups. We perform a structural study aimed at revealing some of the underlying reasons for the differences in the selectivity of ATG8 isoforms. A series of molecular dynamics (MD) simulations are performed to explain the stage-dependent functionality of ATG8. The conserved secondary structure and folding across all ATG8 proteins, resulting in nearly identical protein-protein interaction interfaces, makes this study particularly important and interesting. Recognizing the dual role of the LC3 interacting region (LIR) in autophagosome biogenesis and recruitment of the anchored selective autophagy receptor (SAR), we perform a mobility domain analysis. To this end, the amino acid sequence associated with the LIR docking site (LDS) interface is localized and subjected to root mean square deviation (RMSD)-based clustering analysis. Starting from Atg8-targeted protein-peptide docking, we attempt to identify conformational changes in the contact region of the corresponding adaptors and receptors involved in the common biogenesis events in autophagy. For the molecular dynamics, we select three representatives, sharing common patterns with other members of the groups. The resulting ATG8-peptide complexes display a significant preference for binding specific partners by different ATG8 isotypes.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 7","pages":"e202500025"},"PeriodicalIF":2.8,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144659700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Network Analysis of the Organic Chemistry in Patents, Literature, and Pharmaceutical Industry. 有机化学在专利、文献和制药工业中的网络分析。
IF 2.8 4区 医学
Molecular Informatics Pub Date : 2025-07-01 DOI: 10.1002/minf.202500011
Emma Svensson, Emma Granqvist, Tomas Bastys, Christos Kannas, Mikhail Kabeshov, Samuel Genheden, Ola Engkvist, Thierry Kogej
{"title":"Network Analysis of the Organic Chemistry in Patents, Literature, and Pharmaceutical Industry.","authors":"Emma Svensson, Emma Granqvist, Tomas Bastys, Christos Kannas, Mikhail Kabeshov, Samuel Genheden, Ola Engkvist, Thierry Kogej","doi":"10.1002/minf.202500011","DOIUrl":"10.1002/minf.202500011","url":null,"abstract":"<p><p>Chemical reactions can be connected in large networks such as knowledge graphs. In this way, prior work has been able to draw meaningful conclusions about the properties and structures involved in organic chemistry reactions. However, the research has focused on public sources of organic synthesis that might lack the intricate details of the synthetic routes used in in-house drug discovery. In this work, previous analyses are expanded to also include an in-house electronic lab notebook (ELN) source, such that we can compare it to knowledge graphs that were constructed from US Patent and Trademark Office (USPTO) and Reaxys. We found that the Reaxys knowledge graph is the most interconnected and has the largest proportion of nodes belonging to the core, whereas the USPTO is much less connected and only has a small core. The ELN knowledge graph falls between these extremes in connectivity and it does not have any core. The hub molecules of ELN and USPTO are most similar, primarily represented by small, organic building blocks. We hypothesize that these differences can be attributed to the different origins of the data in the three sources. We discuss what impact this might have on synthesis prediction modelling.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 7","pages":"e202500011"},"PeriodicalIF":2.8,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12273192/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144659699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书