Journal of Cheminformatics最新文献

筛选
英文 中文
Identifying uncertainty in physical–chemical property estimation with IFSQSAR 用 IFSQSAR 识别物理化学特性估算中的不确定性
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2024-05-30 DOI: 10.1186/s13321-024-00853-w
Trevor N. Brown, Alessandro Sangion, Jon A. Arnot
{"title":"Identifying uncertainty in physical–chemical property estimation with IFSQSAR","authors":"Trevor N. Brown,&nbsp;Alessandro Sangion,&nbsp;Jon A. Arnot","doi":"10.1186/s13321-024-00853-w","DOIUrl":"10.1186/s13321-024-00853-w","url":null,"abstract":"<div><p>This study describes the development and evaluation of six new models for predicting physical–chemical (PC) properties that are highly relevant for chemical hazard, exposure, and risk estimation: solubility (in water <i>S</i><sub><i>W</i></sub> and octanol <i>S</i><sub><i>O</i></sub>), vapor pressure (<i>VP</i>), and the octanol–water (<i>K</i><sub><i>OW</i></sub>), octanol–air (<i>K</i><sub><i>OA</i></sub>), and air–water (<i>K</i><sub><i>AW</i></sub>) partition ratios. The models are implemented in the Iterative Fragment Selection Quantitative Structure–Activity Relationship (IFSQSAR) python package, Version 1.1.0. These models are implemented as Poly-Parameter Linear Free Energy Relationship (PPLFER) equations which combine experimentally calibrated system parameters and solute descriptors predicted with QSPRs. Two other ancillary models have been developed and implemented, a QSPR for Molar Volume (<i>MV</i>) and a classifier for the physical state of chemicals at room temperature. The IFSQSAR methods for characterizing applicability domain (AD) and calculating uncertainty estimates expressed as 95% prediction intervals (PI) for predicted properties are described and tested on 9,000 measured partition ratios and 4,000 <i>VP</i> and <i>S</i><sub><i>W</i></sub> values. The measured data are external to IFSQSAR training and validation datasets and are used to assess the predictivity of the models for “novel chemicals” in an unbiased manner. The 95% PI intervals calculated from validation datasets for partition ratios needed to be scaled by a factor of 1.25 to capture 95% of the external data. Predictions for <i>VP</i> and <i>S</i><sub><i>W</i></sub> are more uncertain, primarily due to the challenges in differentiating their physical state (i.e., liquids or solids) at room temperature. The prediction accuracy of the models for log <i>K</i><sub><i>OW</i></sub>, log <i>K</i><sub><i>AW</i></sub> and log <i>K</i><sub><i>OA</i></sub> of novel, data-poor chemicals is estimated to be in the range of 0.7 to 1.4 root mean squared error of prediction (RMSEP), with RMSEP in the range 1.7–1.8 for log <i>VP</i> and log <i>S</i><sub><i>W</i></sub>. </p><p><b>Scientific contribution</b></p><p>New partitioning models integrate empirical PPLFER equations and QSARs, allowing for seamless integration of experimental data and model predictions. This work tests the real predictivity of the models for novel chemicals which are not in the model training or external validation datasets.</p><h3>Graphical Abstract</h3>\u0000<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00853-w","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141177770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MolScore: a scoring, evaluation and benchmarking framework for generative models in de novo drug design MolScore:新药设计中生成模型的评分、评估和基准框架
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2024-05-30 DOI: 10.1186/s13321-024-00861-w
Morgan Thomas, Noel M. O’Boyle, Andreas Bender, Chris De Graaf
{"title":"MolScore: a scoring, evaluation and benchmarking framework for generative models in de novo drug design","authors":"Morgan Thomas,&nbsp;Noel M. O’Boyle,&nbsp;Andreas Bender,&nbsp;Chris De Graaf","doi":"10.1186/s13321-024-00861-w","DOIUrl":"10.1186/s13321-024-00861-w","url":null,"abstract":"<div><p>Generative models are undergoing rapid research and application to de novo drug design. To facilitate their application and evaluation, we present MolScore. MolScore already contains many drug-design-relevant scoring functions commonly used in benchmarks such as, molecular similarity, molecular docking, predictive models, synthesizability, and more. In addition, providing performance metrics to evaluate generative model performance based on the chemistry generated. With this unification of functionality, MolScore re-implements commonly used benchmarks in the field (such as GuacaMol, MOSES, and MolOpt). Moreover, new benchmarks can be created trivially. We demonstrate this by testing a chemical language model with reinforcement learning on three new tasks of increasing complexity related to the design of 5-HT<sub>2a</sub> ligands that utilise either molecular descriptors, 266 pre-trained QSAR models, or dual molecular docking. Lastly, MolScore can be integrated into an existing Python script with just three lines of code. This framework is a step towards unifying generative model application and evaluation as applied to drug design for both practitioners and researchers. The framework can be found on GitHub and downloaded directly from the Python Package Index.</p><p><b>Scientific Contribution</b></p><p>MolScore is an open-source platform to facilitate generative molecular design and evaluation thereof for application in drug design. This platform takes important steps towards unifying existing benchmarks, providing a platform to share new benchmarks, and improves customisation, flexibility and usability for practitioners over existing solutions.</p><h3>Graphical Abstract</h3>\u0000<div><figure><div><div><picture><img></picture></div></div></figure></div></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00861-w","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141177620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Consensus holistic virtual screening for drug discovery: a novel machine learning model approach 用于药物发现的共识整体虚拟筛选:一种新型机器学习模型方法。
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2024-05-28 DOI: 10.1186/s13321-024-00855-8
Said Moshawih, Zhen Hui Bu, Hui Poh Goh, Nurolaini Kifli, Lam Hong Lee, Khang Wen Goh, Long Chiau Ming
{"title":"Consensus holistic virtual screening for drug discovery: a novel machine learning model approach","authors":"Said Moshawih,&nbsp;Zhen Hui Bu,&nbsp;Hui Poh Goh,&nbsp;Nurolaini Kifli,&nbsp;Lam Hong Lee,&nbsp;Khang Wen Goh,&nbsp;Long Chiau Ming","doi":"10.1186/s13321-024-00855-8","DOIUrl":"10.1186/s13321-024-00855-8","url":null,"abstract":"<div><p>In drug discovery, virtual screening is crucial for identifying potential hit compounds. This study aims to present a novel pipeline that employs machine learning models that amalgamates various conventional screening methods. A diverse array of protein targets was selected, and their corresponding datasets were subjected to active/decoy distribution analysis prior to scoring using four distinct methods: QSAR, Pharmacophore, docking, and 2D shape similarity, which were ultimately integrated into a single consensus score. The fine-tuned machine learning models were ranked using the novel formula “w_new”, consensus scores were calculated, and an enrichment study was performed for each target. Distinctively, consensus scoring outperformed other methods in specific protein targets such as PPARG and DPP4, achieving AUC values of 0.90 and 0.84, respectively. Remarkably, this approach consistently prioritized compounds with higher experimental PIC<sub>50</sub> values compared to all other screening methodologies. Moreover, the models demonstrated a range of moderate to high performance in terms of R<sup>2</sup> values during external validation. In conclusion, this novel workflow consistently delivered superior results, emphasizing the significance of a holistic approach in drug discovery, where both quantitative metrics and active enrichment play pivotal roles in identifying the best virtual screening methodology.</p><p><b>Scientific contribution</b></p><p>We presented a novel consensus scoring workflow in virtual screening, merging diverse methods for enhanced compound selection. We also introduced ‘w_new’, a groundbreaking metric that intricately refines machine learning model rankings by weighing various model-specific parameters, revolutionizing their efficacy in drug discovery in addition to other domains.</p><h3>Graphical Abstract</h3>\u0000<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00855-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141159955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TransExION: a transformer based explainable similarity metric for comparing IONS in tandem mass spectrometry TransExION:基于转换器的可解释相似性指标,用于比较串联质谱中的离子。
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2024-05-28 DOI: 10.1186/s13321-024-00858-5
Danh Bui-Thi, Youzhong Liu, Jennifer L. Lippens, Kris Laukens, Thomas De Vijlder
{"title":"TransExION: a transformer based explainable similarity metric for comparing IONS in tandem mass spectrometry","authors":"Danh Bui-Thi,&nbsp;Youzhong Liu,&nbsp;Jennifer L. Lippens,&nbsp;Kris Laukens,&nbsp;Thomas De Vijlder","doi":"10.1186/s13321-024-00858-5","DOIUrl":"10.1186/s13321-024-00858-5","url":null,"abstract":"<p>Small molecule identification is a crucial task in analytical chemistry and life sciences. One of the most commonly used technologies to elucidate small molecule structures is mass spectrometry. Spectral library search of product ion spectra (MS/MS) is a popular strategy to identify or find structural analogues. This approach relies on the assumption that spectral similarity and structural similarity are correlated. However, popular spectral similarity measures, usually calculated based on identical fragment matches between the MS/MS spectra, do not always accurately reflect the structural similarity. In this study, we propose TransExION, a Transformer based Explainable similarity metric for IONS. TransExION detects related fragments between MS/MS spectra through their mass difference and uses these to estimate spectral similarity. These related fragments can be nearly identical, but can also share a substructure. TransExION also provides a post-hoc explanation of its estimation, which can be used to support scientists in evaluating the spectral library search results and thus in structure elucidation of unknown molecules. Our model has a Transformer based architecture and it is trained on the data derived from GNPS MS/MS libraries. The experimental results show that it improves existing spectral similarity measures in searching and interpreting structural analogues as well as in molecular networking.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00858-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141159959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Solvent flashcards: a visualisation tool for sustainable chemistry 溶剂卡片:可持续化学的可视化工具。
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2024-05-28 DOI: 10.1186/s13321-024-00854-9
Joseph Heeley, Samuel Boobier, Jonathan D. Hirst
{"title":"Solvent flashcards: a visualisation tool for sustainable chemistry","authors":"Joseph Heeley,&nbsp;Samuel Boobier,&nbsp;Jonathan D. Hirst","doi":"10.1186/s13321-024-00854-9","DOIUrl":"10.1186/s13321-024-00854-9","url":null,"abstract":"<p>Selecting greener solvents during experiment design is imperative for greener chemistry. While many solvent selection guides are currently used in the pharmaceutical industry, these are often paper-based guides which can make it difficult to identify and compare specific solvents. This work presents a stand-alone version of the solvent flashcards that were developed as part of the AI4Green electronic laboratory notebook. The functionality is an intuitive and interactive interface for the visualisation of data from CHEM21, a pharmaceutical solvent selection guide that categorises solvents according to “greenness”. This open-source software is written in Python, JavaScript, HTML and CSS and allows users to directly contrast and compare specific solvents by generating colour-coded flashcards. It can be installed locally using pip, or alternatively the source code is available on GitHub: https://github.com/AI4Green/solvent_flashcards. The documentation can also be found on GitHub or on the corresponding Python Package Index webpage: https://pypi.org/project/solvent-guide/.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00854-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141159957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of scoring-assisted generative exploration (SAGE) and its application to dual inhibitor design for acetylcholinesterase and monoamine oxidase B 开发评分辅助生成探索(SAGE)及其在乙酰胆碱酯酶和单胺氧化酶 B 双抑制剂设计中的应用。
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2024-05-24 DOI: 10.1186/s13321-024-00845-w
Hocheol Lim
{"title":"Development of scoring-assisted generative exploration (SAGE) and its application to dual inhibitor design for acetylcholinesterase and monoamine oxidase B","authors":"Hocheol Lim","doi":"10.1186/s13321-024-00845-w","DOIUrl":"10.1186/s13321-024-00845-w","url":null,"abstract":"<p>De novo molecular design is the process of searching chemical space for drug-like molecules with desired properties, and deep learning has been recognized as a promising solution. In this study, I developed an effective computational method called Scoring-Assisted Generative Exploration (SAGE) to enhance chemical diversity and property optimization through virtual synthesis simulation, the generation of bridged bicyclic rings, and multiple scoring models for drug-likeness. In six protein targets, SAGE generated molecules with high scores within reasonable numbers of steps by optimizing target specificity without a constraint and even with multiple constraints such as synthetic accessibility, solubility, and metabolic stability. Furthermore, I suggested a top-ranked molecule with SAGE as dual inhibitors of acetylcholinesterase and monoamine oxidase B through multiple desired property optimization. Therefore, SAGE can generate molecules with desired properties by optimizing multiple properties simultaneously, indicating the importance of de novo design strategies in the future of drug discovery and development.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00845-w","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141092836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CineMol: a programmatically accessible direct-to-SVG 3D small molecule drawer CineMol:一个可通过程序直接访问的三维小分子抽屉。
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2024-05-23 DOI: 10.1186/s13321-024-00851-y
David Meijer, Marnix H. Medema, Justin J. J. van der Hooft
{"title":"CineMol: a programmatically accessible direct-to-SVG 3D small molecule drawer","authors":"David Meijer,&nbsp;Marnix H. Medema,&nbsp;Justin J. J. van der Hooft","doi":"10.1186/s13321-024-00851-y","DOIUrl":"10.1186/s13321-024-00851-y","url":null,"abstract":"<div><p>Effective visualization of small molecules is paramount in conveying concepts and results in cheminformatics. Scalable vector graphics (SVG) are preferred for creating such visualizations, as SVGs can be easily altered in post-production and exported to other formats. A wide spectrum of software applications already exist that can visualize molecules, and customize these visualizations, in many ways. However, software packages that can output projected 3D models onto a 2D canvas directly as SVG, while being programmatically accessible from Python, are lacking. Here, we introduce CineMol, which can draw vectorized approximations of three-dimensional small molecule models in seconds, without triangulation or ray tracing, resulting in files of around 50–300 kilobytes per molecule model for compounds with up to 45 heavy atoms. The SVGs outputted by CineMol can be readily modified in popular vector graphics editing software applications. CineMol is written in Python and can be incorporated into any existing Python cheminformatics workflow, as it only depends on native Python libraries. CineMol also provides programmatic access to all its internal states, allowing for per-atom and per-bond-based customization. CineMol’s capacity to programmatically create molecular visualizations suitable for post-production offers researchers and scientists a powerful tool for enhancing the clarity and visual impact of their scientific presentations and publications in cheminformatics, metabolomics, and related scientific disciplines.</p><p><b>Scientific contribution</b></p><p>We introduce CineMol, a Python-based tool that provides a valuable solution for cheminformatics researchers by enabling the direct generation of high-quality approximations of two-dimensional SVG visualizations from three-dimensional small molecule models, all within a programmable Python framework. CineMol offers a unique combination of speed, efficiency, and accessibility, making it an indispensable tool for researchers in cheminformatics, especially when working with SVG visualizations.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00851-y","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141086377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AiZynthFinder 4.0: developments based on learnings from 3 years of industrial application AiZynthFinder 4.0:基于 3 年工业应用经验的开发。
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2024-05-23 DOI: 10.1186/s13321-024-00860-x
Lakshidaa Saigiridharan, Alan Kai Hassen, Helen Lai, Paula Torren-Peraire, Ola Engkvist, Samuel Genheden
{"title":"AiZynthFinder 4.0: developments based on learnings from 3 years of industrial application","authors":"Lakshidaa Saigiridharan,&nbsp;Alan Kai Hassen,&nbsp;Helen Lai,&nbsp;Paula Torren-Peraire,&nbsp;Ola Engkvist,&nbsp;Samuel Genheden","doi":"10.1186/s13321-024-00860-x","DOIUrl":"10.1186/s13321-024-00860-x","url":null,"abstract":"<div><p>We present an updated overview of the AiZynthFinder package for retrosynthesis planning. Since the first version was released in 2020, we have added a substantial number of new features based on user feedback. Feature enhancements include policies for filter reactions, support for any one-step retrosynthesis model, a scoring framework and several additional search algorithms. To exemplify the typical use-cases of the software and highlight some learnings, we perform a large-scale analysis on several hundred thousand target molecules from diverse sources. This analysis looks at for instance route shape, stock usage and exploitation of reaction space, and points out strengths and weaknesses of our retrosynthesis approach. The software is released as open-source for educational purposes as well as to provide a reference implementation of the core algorithms for synthesis prediction. We hope that releasing the software as open-source will further facilitate innovation in developing novel methods for synthetic route prediction. AiZynthFinder is a fast, robust and extensible open-source software and can be downloaded from https://github.com/MolecularAI/aizynthfinder.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00860-x","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141080231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generative design of compounds with desired potency from target protein sequences using a multimodal biochemical language model 利用多模态生化语言模型,从目标蛋白质序列中生成具有所需效力的化合物
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2024-05-22 DOI: 10.1186/s13321-024-00852-x
Hengwei Chen, Jürgen Bajorath
{"title":"Generative design of compounds with desired potency from target protein sequences using a multimodal biochemical language model","authors":"Hengwei Chen,&nbsp;Jürgen Bajorath","doi":"10.1186/s13321-024-00852-x","DOIUrl":"10.1186/s13321-024-00852-x","url":null,"abstract":"<p>Deep learning models adapted from natural language processing offer new opportunities for the prediction of active compounds via machine translation of sequential molecular data representations. For example, chemical language models are often derived for compound string transformation. Moreover, given the principal versatility of language models for translating different types of textual representations, off-the-beaten-path design tasks might be explored. In this work, we have investigated generative design of active compounds with desired potency from target sequence embeddings, representing a rather provoking prediction task. Therefore, a dual-component conditional language model was designed for learning from multimodal data. It comprised a protein language model component for generating target sequence embeddings and a conditional transformer for predicting new active compounds with desired potency. To this end, the designated “biochemical” language model was trained to learn mappings of combined protein sequence and compound potency value embeddings to corresponding compounds, fine-tuned on individual activity classes not encountered during model derivation, and evaluated on compound test sets that were structurally distinct from training sets. The biochemical language model correctly reproduced known compounds with different potency for all activity classes, providing proof-of-concept for the approach. Furthermore, the conditional model consistently reproduced larger numbers of known compounds as well as more potent compounds than an unconditional model, revealing a substantial effect of potency conditioning. The biochemical language model also generated structurally diverse candidate compounds departing from both fine-tuning and test compounds. Overall, generative compound design based on potency value-conditioned target sequence embeddings yielded promising results, rendering the approach attractive for further exploration and practical applications.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00852-x","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141078891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MolPROP: Molecular Property prediction with multimodal language and graph fusion MolPROP:利用多模态语言和图谱融合进行分子特性预测。
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2024-05-22 DOI: 10.1186/s13321-024-00846-9
Zachary A. Rollins, Alan C. Cheng, Essam Metwally
{"title":"MolPROP: Molecular Property prediction with multimodal language and graph fusion","authors":"Zachary A. Rollins,&nbsp;Alan C. Cheng,&nbsp;Essam Metwally","doi":"10.1186/s13321-024-00846-9","DOIUrl":"10.1186/s13321-024-00846-9","url":null,"abstract":"<p>Pretrained deep learning models self-supervised on large datasets of language, image, and graph representations are often fine-tuned on downstream tasks and have demonstrated remarkable adaptability in a variety of applications including chatbots, autonomous driving, and protein folding. Additional research aims to improve performance on downstream tasks by fusing high dimensional data representations across multiple modalities. In this work, we explore a novel fusion of a pretrained language model, ChemBERTa-2, with graph neural networks for the task of molecular property prediction. We benchmark the MolPROP suite of models on seven scaffold split MoleculeNet datasets and compare with state-of-the-art architectures. We find that (1) multimodal property prediction for small molecules can match or significantly outperform modern architectures on hydration free energy (FreeSolv), experimental water solubility (ESOL), lipophilicity (Lipo), and clinical toxicity tasks (ClinTox), (2) the MolPROP multimodal fusion is predominantly beneficial on regression tasks, (3) the ChemBERTa-2 masked language model pretraining task (MLM) outperformed multitask regression pretraining task (MTR) when fused with graph neural networks for multimodal property prediction, and (4) despite improvements from multimodal fusion on regression tasks MolPROP significantly underperforms on some classification tasks. MolPROP has been made available at https://github.com/merck/MolPROP.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00846-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141080241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信