{"title":"In silico prediction of drug-induced liver injury with a complementary integration strategy based on hybrid representation.","authors":"Yaxin Gu, Yimeng Wang, Zengrui Wu, Weihua Li, Guixia Liu, Yun Tang","doi":"10.1002/minf.202200284","DOIUrl":"https://doi.org/10.1002/minf.202200284","url":null,"abstract":"<p><p>Drug-induced liver injury (DILI) is one of the major causes of drug withdrawals, acute liver injury and blackbox warnings. Clinical diagnosis of DILI is a huge challenge due to the complex pathogenesis and lack of specific biomarkers. In recent years, machine learning methods have been used for DILI risk assessment, but the model generalization does not perform satisfactorily. In this study, we constructed a large DILI data set and proposed an integration strategy based on hybrid representations for DILI prediction (HR-DILI). Benefited from feature integration, the hybrid graph neural network models outperformed single representation-based models, among which hybrid-GraphSAGE showed balanced performance in cross-validation with AUC (area under the curve) as 0.804±0.019. In the external validation set, HR-DILI improved the AUC by 6.4 %-35.9 % compared to the base model with a single representation. Compared with published DILI prediction models, HR-DILI had better and balanced performance. The performance of local models for natural products and synthetic compounds were also explored. Furthermore, eight key descriptors and six structural alerts associated with DILI were analyzed to increase the interpretability of the models. The improved performance of HR-DILI indicated that it would provide reliable guidance for DILI risk assessment.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9849638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Timothy B Dunn, Edgar López-López, Taewon David Kim, José L Medina-Franco, Ramón Alain Miranda-Quintana
{"title":"Exploring activity landscapes with extended similarity: is Tanimoto enough?","authors":"Timothy B Dunn, Edgar López-López, Taewon David Kim, José L Medina-Franco, Ramón Alain Miranda-Quintana","doi":"10.1002/minf.202300056","DOIUrl":"https://doi.org/10.1002/minf.202300056","url":null,"abstract":"<p><p>Understanding structure-activity landscapes is essential in drug discovery. Similarly, it has been shown that the presence of activity cliffs in compound data sets can have a substantial impact not only on the design progress but also can influence the predictive ability of machine learning models. With the continued expansion of the chemical space and the currently available large and ultra-large libraries, it is imperative to implement efficient tools to analyze the activity landscape of compound data sets rapidly. The goal of this study is to show the applicability of the n-ary indices to quantify the structure-activity landscapes of large compound data sets using different types of structural representation rapidly and efficiently. We also discuss how a recently introduced medoid algorithm provides the foundation to finding optimum correlations between similarity measures and structure-activity rankings. The applicability of the n-ary indices and the medoid algorithm is shown by analyzing the activity landscape of 10 compound data sets with pharmaceutical relevance using three fingerprints of different designs, 16 extended similarity indices, and 11 coincidence thresholds.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9794062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gopi Mohan C, Anju Pushkaran, Kumaran K, Ann MariaT, Raja Biswas
{"title":"Identification of a PD1/PD-L1 inhibitor by structure-based pharmacophore modelling, virtual screening, molecular docking and biological evaluation.","authors":"Gopi Mohan C, Anju Pushkaran, Kumaran K, Ann MariaT, Raja Biswas","doi":"10.1002/minf.202200254","DOIUrl":"https://doi.org/10.1002/minf.202200254","url":null,"abstract":"<p><p>PD-1/PD-L1 is a critical druggable target for immunotherapy against sepsis. Chemoinformatics techniques involved the structure-based 3D pharmacophore model development followed by virtual screening of small molecule databases to identify the small molecules against PD-L1 pathway inhibition. Raltitrexed and Safinamide act as potent repurposed drugs, and three other Specs database compounds using in silico methods. These compounds were screened based on the pharmacophore fit score and binding affinity towards the active site of the PD-L1 protein. In silico pharmacokinetic profiling of these screened compounds was done to test their biological activity. Next, experimental validation of the best four virtually screened hits was done in vitro for its hemocompatibility and cytotoxicity. Among these, Raltitrexed, Safinamide and Specs compound (AK-968/40642641) effectively increased the proliferation of immune cells and IFN-γ production. These compounds can act as potent PDL-1 inhibitors for adjuvant therapy against sepsis.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9680278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Compression of molecular fingerprints with autoencoder networks.","authors":"Gisbert Schneider, Agnieszka Ilnicka","doi":"10.1002/minf.202300059","DOIUrl":"https://doi.org/10.1002/minf.202300059","url":null,"abstract":"<p><p>Several binary molecular fingerprints were compressed using an autoencoder neural network. We analyzed the impact of compression on fingerprint performance in downstream classification and regression tasks. Classifiers trained on compressed fingerprints were negligibly affected. Regression models benefitted from compression, especially of long fingerprints (Morgan, RDK). However, their performance dropped rapidly for compression levels exceeding 90 %. Property co-learning positively influenced the predictive power of the compressed fingerprints, with a mean score improvement up to 20 %, suggesting that autoencoder compression with property co-learning biases the molecular representation toward the predicted target, facilitating downstream training.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9681391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luis A García-González, Yovani Marrero-Ponce, Carlos A Brizuela, César R García-Jacas
{"title":"Overproduce and select, or determine optimal molecular descriptor subset via configuration space optimization? Application to the prediction of ecotoxicological endpoints.","authors":"Luis A García-González, Yovani Marrero-Ponce, Carlos A Brizuela, César R García-Jacas","doi":"10.1002/minf.202200227","DOIUrl":"https://doi.org/10.1002/minf.202200227","url":null,"abstract":"<p><p>Predicting the likely biological activity (or property) of compounds is a fundamental and challenging task in the drug discovery process. Current computational methodologies aim to improve their predictive accuracies by using deep learning (DL) approaches. However, non-DL based approaches for small- and medium-sized chemical datasets have demonstrated to be most suitable for. In this approach, an initial universe of molecular descriptors (MDs) is first calculated, then different feature selection algorithms are applied, and finally, one or several predictive models are built. Herein we demonstrate that this traditional approach may miss relevant information by assuming that the initial universe of MDs codifies all relevant aspects for the respective learning task. We argue that this limitation is mainly because of the constrained intervals of the parameters used in the algorithms that compute MDs, parameters that define the Descriptor Configuration Space (DCS). We propose to relax these constraints in an open CDS approach, so that a larger universe of MDs can be initially considered. We model the generation of MDs as a multicriteria optimization problem and tackle it with a variant of the standard genetic algorithm. As a novel component, the fitness function is computed by aggregating four criteria via the Choquet integral. Experimental results show that the proposed approach generates a meaningful DCS by improving state-of-the-art approaches in most of the benchmarking chemical datasets accounted for.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9682498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jia-Xi Chang, Jian-Wei Zou, Chao-Yuan Lou, Jia-Xin Ye, Rui Feng, Zi-Yuan Li, Gui-Xiang Hu
{"title":"Gas-to-ionic liquid partition: QSPR modeling and mechanistic interpretation.","authors":"Jia-Xi Chang, Jian-Wei Zou, Chao-Yuan Lou, Jia-Xin Ye, Rui Feng, Zi-Yuan Li, Gui-Xiang Hu","doi":"10.1002/minf.202200223","DOIUrl":"https://doi.org/10.1002/minf.202200223","url":null,"abstract":"<p><p>The present work was devoted to explore the quantitative structure-property relationships for gas-to-ionic liquid partition coefficients (log K<sub>ILA</sub> ). A series of linear models were first established for the representative dataset (IL01). The optimal model was a four-parameter equation (1Ed) consisting of two electrostatic potential-based descriptors ( <math> <semantics><mrow><mi>Σ</mi> <msubsup><mi>V</mi> <mrow><mi>s</mi> <mo>,</mo> <mi>i</mi> <mi>n</mi> <mi>d</mi></mrow> <mo>-</mo></msubsup> </mrow> <annotation>${{rm { Sigma }}{V}_{s,ind}^{-}}$</annotation> </semantics> </math> and V<sub>s,max</sub> ), one 2D matrix-based descriptor (J_D/Dt) and dipole moment (μ). All of the four descriptors introduced in the model can find the corresponding parameters, directly or indirectly, from Abraham's linear solvation energy relationship (LSER) or its theoretical alternatives, which endows the model good interpretability. Gaussian process was utilized to build the nonlinear model. Systematical validations, including 5-fold cross-validation for the training set, the validation for test set, as well as a more rigorous Monte Carlo cross-validation were performed to verify the reliability of the constructed models. Applicability domain of the model was evaluated, and the Williams plot revealed that the model can be used to predict the log K<sub>ILA</sub> values of structurally diverse solutes. The other 13 datasets were also processed in the same way, and all of the linear models with expressions similar to equation 1Ed were obtained. These models, whether linear of nonlinear, represent satisfactory statistical results, which confirms the universality of the method adopted in this study in QSPR modeling of gas-to-IL partition.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10056650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amenah M Al-Imam, Safa Daoud, Ma'mon M Hatmal, Mutasem Omar Taha
{"title":"Augmenting bioactivity by docking-generated multiple ligand poses to enhance machine learning and pharmacophore modelling: discovery of new TTK inhibitors as case study.","authors":"Amenah M Al-Imam, Safa Daoud, Ma'mon M Hatmal, Mutasem Omar Taha","doi":"10.1002/minf.202300022","DOIUrl":"https://doi.org/10.1002/minf.202300022","url":null,"abstract":"<p><p>Dual specificity protein kinase threonine/Tyrosine kinase (TTK) is one of the mitotic kinases. High levels of TTK are detected in several types of cancer. Hence, TTK inhibition is considered a promising therapeutic anti-cancer strategy. In this work, we used multiple docked poses of TTK inhibitors to augment training data for machine learning QSAR modeling. Ligand-Receptor Contacts Fingerprints and docking scoring values were used as descriptor variables. Escalating docking-scoring consensus levels were scanned against orthogonal machine learners, and the best learners (Random Forests and XGBoost) were coupled with genetic algorithm and Shapley additive explanations (SHAP) to determine critical descriptors for predicting anti-TTK bioactivity and for pharmacophore generation. Three successful pharmacophores were deduced and subsequently used for in silico screening against the NCI database. A total of 14 hits were evaluated in vitro for their anti-TTK bioactivities. One hit of novel chemotype showed reasonable dose-response curve with experimental IC<sub>50</sub> of 1.0 μM. The presented work indicates the validity of data augmentation using multiple docked poses for building successful machine learning models and pharmacophore hypotheses.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9675061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Co-model for chemical toxicity prediction based on multi-task deep learning.","authors":"Yuan Yuan Li, Lingfeng Chen, Chengtao Pu, Chengdong Zang, YingChao Yan, Yadong Chen, Yanmin Zhang, Haichun Liu","doi":"10.1002/minf.202200257","DOIUrl":"https://doi.org/10.1002/minf.202200257","url":null,"abstract":"<p><p>The toxicity of compounds is closely related to the effectiveness and safety of drug development, and accurately predicting the toxicity of compounds is one of the most challenging tasks in medicinal chemistry and pharmacology. In this paper, we construct three types of models for single and multi-tasking based on 2D and 3D descriptors, fingerprints and molecular graphs, and then validate the models with benchmark tests on the Tox21 data challenge. We found that due to the information sharing mechanism of multi-task learning, it could address the imbalance problem of the Tox21 data sets to some extent, and the prediction performance of the multi-task was significantly improved compared with the single task in general. Given the complement of the different molecular representations and modeling algorithms, we attempted to integrate them into a robust Co-Model. Our Co-Model performs well in various evaluation metrics on the test set and also achieves significant performance improvement compared to other models in the literature, which clearly demonstrates its superior predictive power and robustness.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9510308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mael A Briand, Loïc Dreano, Ashenafi Legehar, Evgeni Grazhdankin, Leo Ghemtio, Henri Xhaard
{"title":"Exploring cooperative molecular contacts using a PostgreSQL database system.","authors":"Mael A Briand, Loïc Dreano, Ashenafi Legehar, Evgeni Grazhdankin, Leo Ghemtio, Henri Xhaard","doi":"10.1002/minf.202200235","DOIUrl":"https://doi.org/10.1002/minf.202200235","url":null,"abstract":"<p><p>Cooperative molecular contacts play an important role in protein structure and ligand binding. Here, we constructed a PostgreSQL database that stores structural information in the form of atomic environments and allows flexible mining of molecular contacts. Taking the Ser-His-Asp/Glu catalytic triad as a first test case, we demonstrate that the presence of a carboxylate oxygen atom in the vicinity of a His is associated with shorter Ser-OH..N-His bond in the PDB30 subset. We prospectively mine catalytic triads in unannotated proteins, suggesting catalytic functions for unannotated proteins. As a second test case, we demonstrate that this database system can include ligand atoms, represented by Sybyl atom types, by evaluating the proportion of counter-ions for ligand carboxylate oxygens.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9457184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A machine learning strategy with clustering under sampling of majority instances for predicting drug target interactions.","authors":"Tanya Liyaqat, Tanvir Ahmad","doi":"10.1002/minf.202200102","DOIUrl":"https://doi.org/10.1002/minf.202200102","url":null,"abstract":"<p><p>Drug Target Interactions (DTIs) are crucial in drug discovery as it reduces the range of candidate searches, speeding up the drug screening process. Considering in vitro and in vivo experimentations are time and cost-expensive, there has been a surge in computational techniques, especially ML methods for DTIs prediction. Therefore, this study aims to present a methodology that uses molecular structures and amino acid sequences for generating PSSM and PubChem fingerprints for drugs and targets respectively. The proposed work uses a novel technique NearestCUS for handling the class imbalance problem of the benchmark datasets. We use Isomap Embedding to extract features from PSSMs. Feature selection is performed using ANOVA. CatBoost is used for predicting the interaction between drugs and targets for the first time. To quantify the efficacy of NearestCUS, we compared it with other sampling techniques. We found that the proposed methodology performed better than state-of-the-art approaches.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9460164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}