{"title":"A machine learning strategy with clustering under sampling of majority instances for predicting drug target interactions.","authors":"Tanya Liyaqat, Tanvir Ahmad","doi":"10.1002/minf.202200102","DOIUrl":"https://doi.org/10.1002/minf.202200102","url":null,"abstract":"<p><p>Drug Target Interactions (DTIs) are crucial in drug discovery as it reduces the range of candidate searches, speeding up the drug screening process. Considering in vitro and in vivo experimentations are time and cost-expensive, there has been a surge in computational techniques, especially ML methods for DTIs prediction. Therefore, this study aims to present a methodology that uses molecular structures and amino acid sequences for generating PSSM and PubChem fingerprints for drugs and targets respectively. The proposed work uses a novel technique NearestCUS for handling the class imbalance problem of the benchmark datasets. We use Isomap Embedding to extract features from PSSMs. Feature selection is performed using ANOVA. CatBoost is used for predicting the interaction between drugs and targets for the first time. To quantify the efficacy of NearestCUS, we compared it with other sampling techniques. We found that the proposed methodology performed better than state-of-the-art approaches.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"42 5","pages":"e2200102"},"PeriodicalIF":3.6,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9460164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stefan Kohlbacher, Gökhan Ibis, Christian Permann, Sharon Bryant, Thierry Langer, Thomas Seidel
{"title":"A new set of KNIME nodes implementing the QPhAR algorithm.","authors":"Stefan Kohlbacher, Gökhan Ibis, Christian Permann, Sharon Bryant, Thierry Langer, Thomas Seidel","doi":"10.1002/minf.202200245","DOIUrl":"https://doi.org/10.1002/minf.202200245","url":null,"abstract":"<p><p>Dissemination of novel research methods, especially in the form of chemoinformatics software, depends heavily on their ease of applicability for non-expert users with only a little or no programming skills and knowledge in computer science. Visual programming has become widely popular over the last few years, also enabling researchers without in-depth programming skills to develop tailored data processing pipelines using elements from a repository of predefined standard procedures. In this work, we present the development of a set of nodes for the KNIME platform implementing the QPhAR algorithm. We show how the developed KNIME nodes can be included in a typical workflow for biological activity prediction. Furthermore, we present best-practice guidelines that should be followed to obtain high-quality QPhAR models. Finally, we show a typical workflow to train and optimise a QPhAR model in KNIME for a set of given input compounds, applying the discussed best practices.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"42 5","pages":"e2200245"},"PeriodicalIF":3.6,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9826136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fragment-based deep molecular generation using hierarchical chemical graph representation and multi-resolution graph variational autoencoder.","authors":"Zhenxiang Gao, Xinyu Wang, Blake Blumenfeld Gaines, Xuetao Shi, Jinbo Bi, Minghu Song","doi":"10.1002/minf.202200215","DOIUrl":"https://doi.org/10.1002/minf.202200215","url":null,"abstract":"<p><p>Graph generative models have recently emerged as an interesting approach to construct molecular structures atom-by-atom or fragment-by-fragment. In this study, we adopt the fragment-based strategy and decompose each input molecule into a set of small chemical fragments. In drug discovery, a few drug molecules are designed by replacing certain chemical substituents with their bioisosteres or alternative chemical moieties. This inspires us to group decomposed fragments into different fragment clusters according to their local structural environment around bond-breaking positions. In this way, an input structure can be transformed into an equivalent three-layer graph, in which individual atoms, decomposed fragments, or obtained fragment clusters act as graph nodes at each corresponding layer. We further implement a prototype model, named multi-resolution graph variational autoencoder (MRGVAE), to learn embeddings of constituted nodes at each layer in a fine-to-coarse order. Our decoder adopts a similar but conversely hierarchical structure. It first predicts the next possible fragment cluster, then samples an exact fragment structure out of the determined fragment cluster, and sequentially attaches it to the preceding chemical moiety. Our proposed approach demonstrates comparatively good performance in molecular evaluation metrics compared with several other graph-based molecular generative models. The introduction of the additional fragment cluster graph layer will hopefully increase the odds of assembling new chemical moieties absent in the original training set and enhance their structural diversity. We hope that our prototyping work will inspire more creative research to explore the possibility of incorporating different kinds of chemical domain knowledge into a similar multi-resolution neural network architecture.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"42 5","pages":"e2200215"},"PeriodicalIF":3.6,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9455075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discovery of natural-derived M<sup>pro</sup> inhibitors as therapeutic candidates for COVID-19: Structure-based pharmacophore screening combined with QSAR analysis.","authors":"Mohammad A Khanfar, Nada Salaas, Reem Abumostafa","doi":"10.1002/minf.202200198","DOIUrl":"https://doi.org/10.1002/minf.202200198","url":null,"abstract":"<p><p>The main protease (M<sup>pro</sup> ) is an essential enzyme for the life cycle of SARS-CoV-2 and a validated target for treatment of COVID-19 infection. Structure-based pharmacophore modeling combined with QSAR calculations were employed to identify new chemical scaffolds of M<sup>pro</sup> inhibitors from natural products repository. Hundreds of pharmacophore models were manually built from their corresponding X-ray crystallographic structures. A pharmacophore model that was validated by receiver operating characteristic (ROC) curve analysis and selected using the statistically optimum QSAR equation was implemented as a 3D-search tool to mine AnalytiCon Discovery database of natural products. Captured hits that showed the highest predicted inhibitory activities were bioassayed. Three active M<sup>pro</sup> inhibitors (pseurotin A, lactupicrin, and alpinetin) were successfully identified with IC<sub>50</sub> values in low micromolar range.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"42 4","pages":"e2200198"},"PeriodicalIF":3.6,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9660815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Philippe Pinel, Gwenn Guichaoua, Matthieu Najm, Stéphanie Labouille, Nicolas Drizard, Yann Gaston-Mathé, Brice Hoffmann, Véronique Stoven
{"title":"Exploring isofunctional molecules: Design of a benchmark and evaluation of prediction performance.","authors":"Philippe Pinel, Gwenn Guichaoua, Matthieu Najm, Stéphanie Labouille, Nicolas Drizard, Yann Gaston-Mathé, Brice Hoffmann, Véronique Stoven","doi":"10.1002/minf.202200216","DOIUrl":"https://doi.org/10.1002/minf.202200216","url":null,"abstract":"<p><p>Identification of novel chemotypes with biological activity similar to a known active molecule is an important challenge in drug discovery called 'scaffold hopping'. Small-, medium-, and large-step scaffold hopping efforts may lead to increasing degrees of chemical structure novelty with respect to the parent compound. In the present paper, we focus on the problem of large-step scaffold hopping. We assembled a high quality and well characterized dataset of scaffold hopping examples comprising pairs of active molecules and including a variety of protein targets. This dataset was used to build a benchmark corresponding to the setting of real-life applications: one active molecule is known, and the second active is searched among a set of decoys chosen in a way to avoid statistical bias. This allowed us to evaluate the performance of computational methods for solving large-step scaffold hopping problems. In particular, we assessed how difficult these problems are, particularly for classical 2D and 3D ligand-based methods. We also showed that a machine-learning chemogenomic algorithm outperforms classical methods and we provided some useful hints for future improvements.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"42 4","pages":"e2200216"},"PeriodicalIF":3.6,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9645704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hanoch Senderowitz, Malkeet Singh Bahia, Omer Kaspi, Meir Touitou, Idan Binayev, Seema Dhail, Jacob Spiegel, Netaly Khazanov, Abraham Yosipof
{"title":"A comparison between 2D and 3D descriptors in QSAR modeling based on bio-active conformations.","authors":"Hanoch Senderowitz, Malkeet Singh Bahia, Omer Kaspi, Meir Touitou, Idan Binayev, Seema Dhail, Jacob Spiegel, Netaly Khazanov, Abraham Yosipof","doi":"10.1002/minf.202200186","DOIUrl":"https://doi.org/10.1002/minf.202200186","url":null,"abstract":"<p><p>QSAR models are widely and successfully used in many research areas. The success of such models highly depends on molecular descriptors typically classified as 1D, 2D, 3D, or 4D. While 3D information is likely important, e. g., for modeling ligand-protein binding, previous comparisons between the performances of 2D and 3D descriptors were inconclusive. Yet in such comparisons the modeled ligands were not necessarily represented by their bioactive conformations. With this in mind, we mined the PDB for sets of protein-ligand complexes sharing the same protein for which uniform activity data were reported. The results, totaling 461 structures spread across six series were compiled into a carefully curated, first of its kind dataset in which each ligand is represented by its bioactive conformation. Next, each set was characterized by 2D, 3D and 2D + 3D descriptors and modeled using three machine learning algorithms, namely, k-Nearest Neighbors, Random Forest and Lasso Regression. Models' performances were evaluated on external test sets derived from the parent datasets either randomly or in a rational manner. We found that many more significant models were obtained when combining 2D and 3D descriptors. We attribute these improvements to the ability of 2D and 3D descriptors to code for different, yet complementary molecular properties.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"42 4","pages":"e2200186"},"PeriodicalIF":3.6,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9296517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"French dispatch: GTM-based analysis of the Chimiothèque Nationale Chemical Space.","authors":"Polina Oleneva, Yuliana Zabolotna, Dragos Horvath, Gilles Marcou, Fanny Bonachera, Alexandre Varnek","doi":"10.1002/minf.202200208","DOIUrl":"https://doi.org/10.1002/minf.202200208","url":null,"abstract":"<p><p>In order to analyze the Chimiothèque Nationale (CN) - The French National Compound Library - in the context of screening and biologically relevant compounds, the library was compared with ZINC in-stock collection and ChEMBL. This includes the study of chemical space coverage, physicochemical properties and Bemis-Murcko (BM) scaffold populations. More than 5 K CN-unique scaffolds (relative to ZINC and ChEMBL collections) were identified. Generative Topographic Maps (GTMs) accommodating those libraries were generated and used to compare the compound populations. Hierarchical GTM («zooming») was applied to generate an ensemble of maps at various resolution levels, from global overview to precise mapping of individual structures. The respective maps were added to the ChemSpace Atlas website. The analysis of synthetic accessibility in the context of combinatorial chemistry showed that only 29,7 % of CN compounds can be fully synthesized using commercially available building blocks.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"42 4","pages":"e2200208"},"PeriodicalIF":3.6,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9653057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arkaprava Banerjee, Agnieszka Gajewicz-Skretna, K Roy
{"title":"A machine learning q-RASPR approach for efficient predictions of the specific surface area of perovskites.","authors":"Arkaprava Banerjee, Agnieszka Gajewicz-Skretna, K Roy","doi":"10.1002/minf.202200261","DOIUrl":"https://doi.org/10.1002/minf.202200261","url":null,"abstract":"<p><p>In this study, the specific surface area of various perovskites was modeled using a novel quantitative read-across structure-property relationship (q-RASPR) approach, which clubs both Read-Across (RA) and quantitative structure-property relationship (QSPR) together. After optimization of the hyper-parameters, certain similarity-based error measures for each query compound were obtained. Clubbing some of these error-based measures with the previously selected features along with the Read-Across prediction function, a number of machine learning models were developed using Partial Least Squares (PLS), Ridge Regression (RR), Linear Support Vector Regression (LSVR), Random Forest (RF) regression, Gradient Boost (GBoost), Adaptive Boosting (Adaboost), Multiple Layer Perceptron (MLP) regression and k-Nearest Neighbor (kNN) regression. Based on the repeated cross-validation as well as external prediction quality and interpretability, the PLS model (n<sub>Training</sub> = 38, n<sub>Test</sub> = 12, <math> <semantics><msubsup><mi>R</mi> <mrow><mi>T</mi> <mi>r</mi> <mi>a</mi> <mi>i</mi> <mi>n</mi></mrow> <mn>2</mn></msubsup> <annotation>${{R}_{Train}^{2}}$</annotation> </semantics> </math> =0.737, <math> <semantics> <mrow><msubsup><mi>Q</mi> <mrow><mi>L</mi> <mi>O</mi> <mi>O</mi></mrow> <mn>2</mn></msubsup> <mo>=</mo> <mn>0</mn> <mo>.</mo> <mn>637</mn> <mo>,</mo> <mspace></mspace> <msubsup><mi>R</mi> <mrow><mi>T</mi> <mi>e</mi> <mi>s</mi> <mi>t</mi></mrow> <mn>2</mn></msubsup> <mo>=</mo> <mn>0</mn> <mo>.</mo> <mn>898</mn> <mo>,</mo> <mspace></mspace> <mspace></mspace> <msubsup><mi>Q</mi> <mrow><mi>F</mi> <mn>1</mn> <mfenced><mi>T</mi> <mi>e</mi> <mi>s</mi> <mi>t</mi></mfenced> </mrow> <mn>2</mn></msubsup> <mrow><mo>=</mo> <mn>0</mn> <mo>.</mo> <mn>901</mn> <mo>)</mo></mrow> </mrow> <annotation>${{Q}_{LOO}^{2}=0.637, {R}_{Test}^{2}=0.898,{rm } {Q}_{F1left(Testright)}^{2}=0.901)}$</annotation> </semantics> </math> was selected as the best predictor which underscored the previously reported results. The finally selected model should efficiently predict specific surface areas of other perovskites for their use in photocatalysis. The new q-RASPR method also appears promising for the prediction of several other property endpoints of interest in materials science.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"42 4","pages":"e2200261"},"PeriodicalIF":3.6,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9284533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kushagra Kashyap, Lalita Panigrahi, Shakil Ahmed, Mohammad Siddiqi
{"title":"Artificial neural network models driven novel virtual screening workflow for the identification and biological evaluation of BACE1 inhibitors.","authors":"Kushagra Kashyap, Lalita Panigrahi, Shakil Ahmed, Mohammad Siddiqi","doi":"10.1002/minf.202200113","DOIUrl":"https://doi.org/10.1002/minf.202200113","url":null,"abstract":"<p><p>Beta-site amyloid-β precursor protein-cleaving enzyme 1 (BACE1) is a transmembrane aspartic protease and has shown potential as a possible therapeutic target for Alzheimer's disease. This aggravating disease involves the aberrant production of β amyloid plaques by BACE1 which catalyzes the rate-limiting step by cleaving the amyloid precursor protein (APP), generating the neurotoxic amyloid β protein that aggregates to form plaques leading to neurodegeneration. Therefore, it is indispensable to inhibit BACE1, thus modulating the APP processing. In this study, we present a workflow that utilizes a multi-stage virtual screening protocol for identifying potential BACE1 inhibitors by employing multiple artificial neural network-based models. Collectively, all the hyperparameter tuned models were assigned a task to virtually screen Maybridge library, thus yielding a consensus of 41 hits. The majority of these hits exhibited optimal pharmacokinetic properties confirmed by high central nervous system multiparameter optimization (CNS-MPO) scores. Further shortlisting of 8 compounds by molecular docking into the active site of BACE1 and their subsequent in-vitro evaluation identified 4 compounds as potent BACE1 inhibitors with IC50 values falling in the range 0.028-0.052 μM and can be further optimized with medicinal chemistry efforts to improve their activity.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"42 3","pages":"e2200113"},"PeriodicalIF":3.6,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9284492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}