{"title":"A machine learning strategy with clustering under sampling of majority instances for predicting drug target interactions.","authors":"Tanya Liyaqat, Tanvir Ahmad","doi":"10.1002/minf.202200102","DOIUrl":null,"url":null,"abstract":"<p><p>Drug Target Interactions (DTIs) are crucial in drug discovery as it reduces the range of candidate searches, speeding up the drug screening process. Considering in vitro and in vivo experimentations are time and cost-expensive, there has been a surge in computational techniques, especially ML methods for DTIs prediction. Therefore, this study aims to present a methodology that uses molecular structures and amino acid sequences for generating PSSM and PubChem fingerprints for drugs and targets respectively. The proposed work uses a novel technique NearestCUS for handling the class imbalance problem of the benchmark datasets. We use Isomap Embedding to extract features from PSSMs. Feature selection is performed using ANOVA. CatBoost is used for predicting the interaction between drugs and targets for the first time. To quantify the efficacy of NearestCUS, we compared it with other sampling techniques. We found that the proposed methodology performed better than state-of-the-art approaches.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":2.8000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/minf.202200102","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0
Abstract
Drug Target Interactions (DTIs) are crucial in drug discovery as it reduces the range of candidate searches, speeding up the drug screening process. Considering in vitro and in vivo experimentations are time and cost-expensive, there has been a surge in computational techniques, especially ML methods for DTIs prediction. Therefore, this study aims to present a methodology that uses molecular structures and amino acid sequences for generating PSSM and PubChem fingerprints for drugs and targets respectively. The proposed work uses a novel technique NearestCUS for handling the class imbalance problem of the benchmark datasets. We use Isomap Embedding to extract features from PSSMs. Feature selection is performed using ANOVA. CatBoost is used for predicting the interaction between drugs and targets for the first time. To quantify the efficacy of NearestCUS, we compared it with other sampling techniques. We found that the proposed methodology performed better than state-of-the-art approaches.
期刊介绍:
Molecular Informatics is a peer-reviewed, international forum for publication of high-quality, interdisciplinary research on all molecular aspects of bio/cheminformatics and computer-assisted molecular design. Molecular Informatics succeeded QSAR & Combinatorial Science in 2010.
Molecular Informatics presents methodological innovations that will lead to a deeper understanding of ligand-receptor interactions, macromolecular complexes, molecular networks, design concepts and processes that demonstrate how ideas and design concepts lead to molecules with a desired structure or function, preferably including experimental validation.
The journal''s scope includes but is not limited to the fields of drug discovery and chemical biology, protein and nucleic acid engineering and design, the design of nanomolecular structures, strategies for modeling of macromolecular assemblies, molecular networks and systems, pharmaco- and chemogenomics, computer-assisted screening strategies, as well as novel technologies for the de novo design of biologically active molecules. As a unique feature Molecular Informatics publishes so-called "Methods Corner" review-type articles which feature important technological concepts and advances within the scope of the journal.