Journal of Cheminformatics最新文献

筛选
英文 中文
Distance plus attention for binding affinity prediction 结合亲和力预测的距离加注意力
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2024-05-12 DOI: 10.1186/s13321-024-00844-x
Julia Rahman, M. A. Hakim Newton, Mohammed Eunus Ali, Abdul Sattar
{"title":"Distance plus attention for binding affinity prediction","authors":"Julia Rahman,&nbsp;M. A. Hakim Newton,&nbsp;Mohammed Eunus Ali,&nbsp;Abdul Sattar","doi":"10.1186/s13321-024-00844-x","DOIUrl":"10.1186/s13321-024-00844-x","url":null,"abstract":"<div><p>Protein-ligand binding affinity plays a pivotal role in drug development, particularly in identifying potential ligands for target disease-related proteins. Accurate affinity predictions can significantly reduce both the time and cost involved in drug development. However, highly precise affinity prediction remains a research challenge. A key to improve affinity prediction is to capture interactions between proteins and ligands effectively. Existing deep-learning-based computational approaches use 3D grids, 4D tensors, molecular graphs, or proximity-based adjacency matrices, which are either resource-intensive or do not directly represent potential interactions. In this paper, we propose atomic-level distance features and attention mechanisms to capture better specific protein-ligand interactions based on donor-acceptor relations, hydrophobicity, and <span>(pi )</span>-stacking atoms. We argue that distances encompass both short-range direct and long-range indirect interaction effects while attention mechanisms capture levels of interaction effects. On the very well-known CASF-2016 dataset, our proposed method, named Distance plus Attention for Affinity Prediction (DAAP), significantly outperforms existing methods by achieving Correlation Coefficient (R) 0.909, Root Mean Squared Error (RMSE) 0.987, Mean Absolute Error (MAE) 0.745, Standard Deviation (SD) 0.988, and Concordance Index (CI) 0.876. The proposed method also shows substantial improvement, around 2% to 37%, on five other benchmark datasets. The program and data are publicly available on the website https://gitlab.com/mahnewton/daap.</p><p><b>Scientific Contribution Statement</b></p><p>This study innovatively introduces\u0000distance-based features to predict protein-ligand binding affinity, capitalizing on\u0000unique molecular interactions. Furthermore, the incorporation of protein sequence\u0000features of specific residues enhances the model’s proficiency in capturing intricate\u0000binding patterns. The predictive capabilities are further strengthened through the\u0000use of a deep learning architecture with attention mechanisms, and an ensemble\u0000approach, averaging the outputs of five models, is implemented to ensure robust\u0000and reliable predictions.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2024-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00844-x","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140910620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CIME4R: Exploring iterative, AI-guided chemical reaction optimization campaigns in their parameter space CIME4R:在参数空间探索人工智能引导的化学反应迭代优化活动。
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2024-05-10 DOI: 10.1186/s13321-024-00840-1
Christina Humer, Rachel Nicholls, Henry Heberle, Moritz Heckmann, Michael Pühringer, Thomas Wolf, Maximilian Lübbesmeyer, Julian Heinrich, Julius Hillenbrand, Giulio Volpin, Marc Streit
{"title":"CIME4R: Exploring iterative, AI-guided chemical reaction optimization campaigns in their parameter space","authors":"Christina Humer,&nbsp;Rachel Nicholls,&nbsp;Henry Heberle,&nbsp;Moritz Heckmann,&nbsp;Michael Pühringer,&nbsp;Thomas Wolf,&nbsp;Maximilian Lübbesmeyer,&nbsp;Julian Heinrich,&nbsp;Julius Hillenbrand,&nbsp;Giulio Volpin,&nbsp;Marc Streit","doi":"10.1186/s13321-024-00840-1","DOIUrl":"10.1186/s13321-024-00840-1","url":null,"abstract":"<p>Chemical reaction optimization (RO) is an iterative process that results in large, high-dimensional datasets. Current tools allow for only limited analysis and understanding of parameter spaces, making it hard for scientists to review or follow changes throughout the process. With the recent emergence of using artificial intelligence (AI) models to aid RO, another level of complexity has been added. Helping to assess the quality of a model’s prediction and understand its decision is critical to supporting human-AI collaboration and trust calibration. To address this, we propose CIME4R—an open-source interactive web application for analyzing RO data and AI predictions. CIME4R supports users in <i>(</i><i>i</i><i>)</i> comprehending a reaction parameter space, <i>(</i><i>ii</i><i>)</i> investigating how an RO process developed over iterations, <i>(</i><i>iii</i><i>)</i> identifying critical factors of a reaction, and <i>(</i><i>iv</i><i>)</i> understanding model predictions. This facilitates making informed decisions during the RO process and helps users to review a completed RO process, especially in AI-guided RO. CIME4R aids decision-making through the interaction between humans and AI by combining the strengths of expert experience and high computational precision. We developed and tested CIME4R with domain experts and verified its usefulness in three case studies. Using CIME4R the experts were able to produce valuable insights from past RO campaigns and to make informed decisions on which experiments to perform next. We believe that CIME4R is the beginning of an open-source community project with the potential to improve the workflow of scientists working in the reaction optimization domain.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00840-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140903633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging computational tools to combat malaria: assessment and development of new therapeutics 利用计算工具防治疟疾:新疗法的评估与开发
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2024-05-02 DOI: 10.1186/s13321-024-00842-z
Nomagugu B. Ncube, Matshawandile Tukulula, Krishna G. Govender
{"title":"Leveraging computational tools to combat malaria: assessment and development of new therapeutics","authors":"Nomagugu B. Ncube,&nbsp;Matshawandile Tukulula,&nbsp;Krishna G. Govender","doi":"10.1186/s13321-024-00842-z","DOIUrl":"10.1186/s13321-024-00842-z","url":null,"abstract":"<p>As the world grapples with the relentless challenges posed by diseases like malaria, the advent of sophisticated computational tools has emerged as a beacon of hope in the quest for effective treatments. In this study we delve into the strategies behind computational tools encompassing virtual screening, molecular docking, artificial intelligence (AI), and machine learning (ML). We assess their effectiveness and contribution to the progress of malaria treatment. The convergence of these computational strategies, coupled with the ever-increasing power of computing systems, has ushered in a new era of drug discovery, holding immense promise for the eradication of malaria.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00842-z","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140819259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From papers to RDF-based integration of physicochemical data and adverse outcome pathways for nanomaterials 从论文到基于 RDF 的纳米材料理化数据和不良后果途径的整合
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2024-05-01 DOI: 10.1186/s13321-024-00833-0
Jeaphianne P. M. van Rijn, Marvin Martens, Ammar Ammar, Mihaela Roxana Cimpan, Valerie Fessard, Peter Hoet, Nina Jeliazkova, Sivakumar Murugadoss, Ivana Vinković Vrček, Egon L. Willighagen
{"title":"From papers to RDF-based integration of physicochemical data and adverse outcome pathways for nanomaterials","authors":"Jeaphianne P. M. van Rijn,&nbsp;Marvin Martens,&nbsp;Ammar Ammar,&nbsp;Mihaela Roxana Cimpan,&nbsp;Valerie Fessard,&nbsp;Peter Hoet,&nbsp;Nina Jeliazkova,&nbsp;Sivakumar Murugadoss,&nbsp;Ivana Vinković Vrček,&nbsp;Egon L. Willighagen","doi":"10.1186/s13321-024-00833-0","DOIUrl":"10.1186/s13321-024-00833-0","url":null,"abstract":"<p>Adverse Outcome Pathways (AOPs) have been proposed to facilitate mechanistic understanding of interactions of chemicals/materials with biological systems. Each AOP starts with a molecular initiating event (MIE) and possibly ends with adverse outcome(s) (AOs) via a series of key events (KEs). So far, the interaction of engineered nanomaterials (ENMs) with biomolecules, biomembranes, cells, and biological structures, in general, is not yet fully elucidated. There is also a huge lack of information on which AOPs are ENMs-relevant or -specific, despite numerous published data on toxicological endpoints they trigger, such as oxidative stress and inflammation. We propose to integrate related data and knowledge recently collected. Our approach combines the annotation of nanomaterials and their MIEs with ontology annotation to demonstrate how we can then query AOPs and biological pathway information for these materials. We conclude that a FAIR (Findable, Accessible, Interoperable, Reusable) representation of the ENM-MIE knowledge simplifies integration with other knowledge.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00833-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140817962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
QuanDB: a quantum chemical property database towards enhancing 3D molecular representation learning QuanDB:加强三维分子表征学习的量子化学特性数据库
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2024-04-29 DOI: 10.1186/s13321-024-00843-y
Zhijiang Yang, Tengxin Huang, Li Pan, Jingjing Wang, Liangliang Wang, Junjie Ding, Junhua Xiao
{"title":"QuanDB: a quantum chemical property database towards enhancing 3D molecular representation learning","authors":"Zhijiang Yang,&nbsp;Tengxin Huang,&nbsp;Li Pan,&nbsp;Jingjing Wang,&nbsp;Liangliang Wang,&nbsp;Junjie Ding,&nbsp;Junhua Xiao","doi":"10.1186/s13321-024-00843-y","DOIUrl":"10.1186/s13321-024-00843-y","url":null,"abstract":"<div><p>Previous studies have shown that the three-dimensional (3D) geometric and electronic structure of molecules play a crucial role in determining their key properties and intermolecular interactions. Therefore, it is necessary to establish a quantum chemical (QC) property database containing the most stable 3D geometric conformations and electronic structures of molecules. In this study, a high-quality QC property database, called QuanDB, was developed, which included structurally diverse molecular entities and featured a user-friendly interface. Currently, QuanDB contains 154,610 compounds sourced from public databases and scientific literature, with 10,125 scaffolds. The elemental composition comprises nine elements: H, C, O, N, P, S, F, Cl, and Br. For each molecule, QuanDB provides 53 global and 5 local QC properties and the most stable 3D conformation. These properties are divided into three categories: geometric structure, electronic structure, and thermodynamics. Geometric structure optimization and single point energy calculation at the theoretical level of B3LYP-D3(BJ)/6-311G(d)/SMD/water and B3LYP-D3(BJ)/def2-TZVP/SMD/water, respectively, were applied to ensure highly accurate calculations of QC properties, with the computational cost exceeding 107 core-hours. QuanDB provides high-value geometric and electronic structure information for use in molecular representation models, which are critical for machine-learning-based molecular design, thereby contributing to a comprehensive description of the chemical compound space. As a new high-quality dataset for QC properties, QuanDB is expected to become a benchmark tool for the training and optimization of machine learning models, thus further advancing the development of novel drugs and materials. QuanDB is freely available, without registration, at https://quandb.cmdrg.com/.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00843-y","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140814116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Classification of battery compounds using structure-free Mendeleev encodings 使用无结构门捷列夫编码对电池化合物进行分类
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2024-04-26 DOI: 10.1186/s13321-024-00836-x
Zixin Zhuang, Amanda S. Barnard
{"title":"Classification of battery compounds using structure-free Mendeleev encodings","authors":"Zixin Zhuang,&nbsp;Amanda S. Barnard","doi":"10.1186/s13321-024-00836-x","DOIUrl":"10.1186/s13321-024-00836-x","url":null,"abstract":"<p>Machine learning is a valuable tool that can accelerate the discovery and design of materials occupying combinatorial chemical spaces. However, the prerequisite need for vast amounts of training data can be prohibitive when significant resources are needed to characterize or simulate candidate structures. Recent results have shown that structure-free encoding of complex materials, based entirely on chemical compositions, can overcome this impediment and perform well in unsupervised learning tasks. In this study, we extend this exploration to supervised classification, and show how structure-free encoding can accurately predict classes of material compounds for battery applications without time consuming measurement of bonding networks, lattices or densities.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00836-x","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140648706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning symmetry-aware atom mapping in chemical reactions through deep graph matching 通过深度图匹配学习化学反应中对称感知的原子映射
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2024-04-22 DOI: 10.1186/s13321-024-00841-0
Maryam Astero, Juho Rousu
{"title":"Learning symmetry-aware atom mapping in chemical reactions through deep graph matching","authors":"Maryam Astero,&nbsp;Juho Rousu","doi":"10.1186/s13321-024-00841-0","DOIUrl":"10.1186/s13321-024-00841-0","url":null,"abstract":"<div><p>Accurate atom mapping, which establishes correspondences between atoms in reactants and products, is a crucial step in analyzing chemical reactions. In this paper, we present a novel end-to-end approach that formulates the atom mapping problem as a deep graph matching task. Our proposed model, AMNet (Atom Matching Network), utilizes molecular graph representations and employs various atom and bond features using graph neural networks to capture the intricate structural characteristics of molecules, ensuring precise atom correspondence predictions. Notably, AMNet incorporates the consideration of molecule symmetry, enhancing accuracy while simultaneously reducing computational complexity. The integration of the Weisfeiler-Lehman isomorphism test for symmetry identification refines the model’s predictions. Furthermore, our model maps the entire atom set in a chemical reaction, offering a comprehensive approach beyond focusing solely on the main molecules in reactions. We evaluated AMNet’s performance on a subset of USPTO reaction datasets, addressing various tasks, including assessing the impact of molecular symmetry identification, understanding the influence of feature selection on AMNet performance, and comparing its performance with the state-of-the-art method. The result reveals an average accuracy of 97.3% on mapped atoms, with 99.7% of reactions correctly mapped when the correct mapped atom is within the top 10 predicted atoms.</p><p><b>Scientific contribution</b></p><p>The paper introduces a novel end-to-end deep graph matching model for atom mapping, utilizing molecular graph representations to capture structural characteristics effectively. It enhances accuracy by integrating symmetry detection through the Weisfeiler-Lehman test, reducing the number of possible mappings and improving efficiency. Unlike previous methods, it maps the entire reaction, not just main components, providing a comprehensive view. Additionally, by integrating efficient graph matching techniques, it reduces computational complexity, making atom mapping more feasible.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00841-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140633637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Classification of substances by health hazard using deep neural networks and molecular electron densities 利用深度神经网络和分子电子密度按健康危害对物质进行分类
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2024-04-16 DOI: 10.1186/s13321-024-00835-y
Satnam Singh, Gina Zeh, Jessica Freiherr, Thilo Bauer, Isik Türkmen, Andreas T. Grasskamp
{"title":"Classification of substances by health hazard using deep neural networks and molecular electron densities","authors":"Satnam Singh,&nbsp;Gina Zeh,&nbsp;Jessica Freiherr,&nbsp;Thilo Bauer,&nbsp;Isik Türkmen,&nbsp;Andreas T. Grasskamp","doi":"10.1186/s13321-024-00835-y","DOIUrl":"10.1186/s13321-024-00835-y","url":null,"abstract":"<p>In this paper we present a method that allows leveraging 3D electron density information to train a deep neural network pipeline to segment regions of high, medium and low electronegativity and classify substances as health hazardous or non-hazardous. We show that this can be used for use-cases such as cosmetics and food products. For this purpose, we first generate 3D electron density cubes using semiempirical molecular calculations for a custom European Chemicals Agency (ECHA) subset consisting of substances labelled as hazardous and non-hazardous for cosmetic usage. Together with their 3-class electronegativity maps we train a modified 3D-UNet with electron density cubes to segment reactive sites in molecules and classify substances with an accuracy of 78.1%. We perform the same process on a custom food dataset (CompFood) consisting of hazardous and non-hazardous substances compiled from European Food Safety Authority (EFSA) OpenFoodTox, Food and Drug Administration (FDA) Generally Recognized as Safe (GRAS) and FooDB datasets to achieve a classification accuracy of 64.1%. Our results show that 3D electron densities and particularly masked electron densities, calculated by taking a product of original electron densities and regions of high and low electronegativity can be used to classify molecules for different use-cases and thus serve not only to guide safe-by-design product development but also aid in regulatory decisions.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00835-y","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140557188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Meta-learning-based Inductive logistic matrix completion for prediction of kinase inhibitors 基于元学习的归纳逻辑矩阵完成法预测激酶抑制剂
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2024-04-16 DOI: 10.1186/s13321-024-00838-9
Ming Du, XingRan Xie, Jing Luo, Jin Li
{"title":"Meta-learning-based Inductive logistic matrix completion for prediction of kinase inhibitors","authors":"Ming Du,&nbsp;XingRan Xie,&nbsp;Jing Luo,&nbsp;Jin Li","doi":"10.1186/s13321-024-00838-9","DOIUrl":"10.1186/s13321-024-00838-9","url":null,"abstract":"<p>Protein kinases become an important source of potential drug targets. Developing new, efficient, and safe small-molecule kinase inhibitors has become an important topic in the field of drug research and development. In contrast with traditional wet experiments which are time-consuming and expensive, machine learning-based approaches for predicting small molecule inhibitors for protein kinases are time-saving and cost-effective, which are highly desired for us. However, the issue of sample scarcity (known active and inactive compounds are usually limited for most kinases) poses a challenge to the research and development of machine learning-based kinase inhibitors' active prediction methods. To alleviate the data scarcity problem in the prediction of kinase inhibitors, in this study, we present a novel Meta-learning-based inductive logistic matrix completion method for the Prediction of Kinase Inhibitors (MetaILMC). MetaILMC adopts a meta-learning framework to learn a well-generalized model from tasks with sufficient samples, which can fast adapt to new tasks with limited samples. As MetaILMC allows the effective transfer of the prior knowledge learned from kinases with sufficient samples to kinases with a small number of samples, the proposed model can produce accurate predictions for kinases with limited data. Experimental results show that MetaILMC has excellent performance for prediction tasks of kinases with few-shot samples and is significantly superior to the state-of-the-art multi-task learning in terms of AUC, AUPR, etc., various performance metrics. Case studies also provided for two drugs to predict Kinase Inhibitory scores, further validating the proposed method's effectiveness and feasibility.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00838-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140557192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mind your prevalence! 注意你的流行!
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2024-04-15 DOI: 10.1186/s13321-024-00837-w
Sébastien J. J. Guesné, Thierry Hanser, Stéphane Werner, Samuel Boobier, Shaylyn Scott
{"title":"Mind your prevalence!","authors":"Sébastien J. J. Guesné,&nbsp;Thierry Hanser,&nbsp;Stéphane Werner,&nbsp;Samuel Boobier,&nbsp;Shaylyn Scott","doi":"10.1186/s13321-024-00837-w","DOIUrl":"10.1186/s13321-024-00837-w","url":null,"abstract":"<p>Multiple metrics are used when assessing and validating the performance of quantitative structure–activity relationship (QSAR) models. In the case of binary classification, balanced accuracy is a metric to assess the global performance of such models. In contrast to accuracy, balanced accuracy does not depend on the respective prevalence of the two categories in the test set that is used to validate a QSAR classifier. As such, balanced accuracy is used to overcome the effect of imbalanced test sets on the model’s perceived accuracy. Matthews' correlation coefficient (MCC), an alternative global performance metric, is also known to mitigate the imbalance of the test set. However, in contrast to the balanced accuracy, MCC remains dependent on the respective prevalence of the predicted categories. For simplicity, the rest of this work is based on the positive prevalence. The MCC value may be underestimated at high or extremely low positive prevalence. It contributes to more challenging comparisons between experiments using test sets with different positive prevalences and may lead to incorrect interpretations. The concept of balanced metrics beyond balanced accuracy is, to the best of our knowledge, not yet described in the cheminformatic literature. Therefore, after describing the relevant literature, this manuscript will first formally define a confusion matrix, sensitivity and specificity and then present, with synthetic data, the danger of comparing performance metrics under nonconstant prevalence. Second, it will demonstrate that balanced accuracy is the performance metric accuracy calibrated to a test set with a positive prevalence of 50% (i.e., balanced test set). This concept of balanced accuracy will then be extended to the MCC after showing its dependency on the positive prevalence. Applying the same concept to any other performance metric and widening it to the concept of calibrated metrics will then be briefly discussed. We will show that, like balanced accuracy, any balanced performance metric may be expressed as a function of the well-known values of sensitivity and specificity. Finally, a tale of two MCCs will exemplify the use of this concept of balanced MCC versus MCC with four use cases using synthetic data.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00837-w","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140552044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信