{"title":"Prediction of drug-induced nephrotoxicity based on deep learning algorithm and molecular fingerprints.","authors":"Shuailong Wang, Yan Li","doi":"10.1007/s11030-025-11376-3","DOIUrl":null,"url":null,"abstract":"<p><p>Drug-induced nephrotoxicity (DIN) is an infrequent adverse reaction to medications and represents a complex clinical outcome influenced by multiple factors. Predicting DIN using preclinical animal models remains challenging, and in silico approaches have emerged as promising alternatives for DIN risk assessment. A high-quality dataset consisting of 1,018 compounds was constructed in this study. Compounds in this dataset were systematically collected from five authoritative sources: the SIDER, FDA, ChEMBL, DrugBank, and literature on \"drug-induced nephrotoxicity\" published in the past decade (screened via keyword search on PubMed). Clear criteria were followed for compound screening and label annotation: using \"kidney,\" \"nephrotoxicity,\" \"kidney injury,\" and \"kidney disease\" as core search terms, compounds retrieved that were clearly associated with kidney injury or could induce kidney disease were classified into the positive set (DIN = 1); compounds with no records of renal adverse reactions, or those explicitly having renal protective effects or used for treating renal diseases, were classified into the negative set (DIN = 0). Ultimately, a dataset of 1018 compounds with clear labels and reliable sources was integrated. The 42 classification models, which depended on six different molecular fingerprints, were built via deep neural network (DNN) and six machine learning algorithms. A comparative study demonstrated that models utilizing DNN consistently surpassed traditional machine learning approaches across six molecular fingerprint types. Notably, the ECFP_6 fingerprint exhibited the highest performance, achieving an area under the receiver operating characteristic curve (AUC) of 75.9%, an accuracy (ACC) of 71.4%, and an F1-score of 76.0%. Furthermore, the SHapley Additive exPlanations (SHAP) algorithm was applied to interpret the predictions of the high-performing models, identifying key structural fragments associated with DIN. The ten most influential substructures, identified based on their impact on model predictions, were chosen as early warning markers for future DIN screening research. Overall, these results suggest that DNN models utilizing molecular fingerprints can function as dependable and efficient tools for assessing nephrotoxicity risk in potential drug candidates during the initial phases of drug development.</p>","PeriodicalId":708,"journal":{"name":"Molecular Diversity","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Diversity","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1007/s11030-025-11376-3","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, APPLIED","Score":null,"Total":0}
引用次数: 0
Abstract
Drug-induced nephrotoxicity (DIN) is an infrequent adverse reaction to medications and represents a complex clinical outcome influenced by multiple factors. Predicting DIN using preclinical animal models remains challenging, and in silico approaches have emerged as promising alternatives for DIN risk assessment. A high-quality dataset consisting of 1,018 compounds was constructed in this study. Compounds in this dataset were systematically collected from five authoritative sources: the SIDER, FDA, ChEMBL, DrugBank, and literature on "drug-induced nephrotoxicity" published in the past decade (screened via keyword search on PubMed). Clear criteria were followed for compound screening and label annotation: using "kidney," "nephrotoxicity," "kidney injury," and "kidney disease" as core search terms, compounds retrieved that were clearly associated with kidney injury or could induce kidney disease were classified into the positive set (DIN = 1); compounds with no records of renal adverse reactions, or those explicitly having renal protective effects or used for treating renal diseases, were classified into the negative set (DIN = 0). Ultimately, a dataset of 1018 compounds with clear labels and reliable sources was integrated. The 42 classification models, which depended on six different molecular fingerprints, were built via deep neural network (DNN) and six machine learning algorithms. A comparative study demonstrated that models utilizing DNN consistently surpassed traditional machine learning approaches across six molecular fingerprint types. Notably, the ECFP_6 fingerprint exhibited the highest performance, achieving an area under the receiver operating characteristic curve (AUC) of 75.9%, an accuracy (ACC) of 71.4%, and an F1-score of 76.0%. Furthermore, the SHapley Additive exPlanations (SHAP) algorithm was applied to interpret the predictions of the high-performing models, identifying key structural fragments associated with DIN. The ten most influential substructures, identified based on their impact on model predictions, were chosen as early warning markers for future DIN screening research. Overall, these results suggest that DNN models utilizing molecular fingerprints can function as dependable and efficient tools for assessing nephrotoxicity risk in potential drug candidates during the initial phases of drug development.
期刊介绍:
Molecular Diversity is a new publication forum for the rapid publication of refereed papers dedicated to describing the development, application and theory of molecular diversity and combinatorial chemistry in basic and applied research and drug discovery. The journal publishes both short and full papers, perspectives, news and reviews dealing with all aspects of the generation of molecular diversity, application of diversity for screening against alternative targets of all types (biological, biophysical, technological), analysis of results obtained and their application in various scientific disciplines/approaches including:
combinatorial chemistry and parallel synthesis;
small molecule libraries;
microwave synthesis;
flow synthesis;
fluorous synthesis;
diversity oriented synthesis (DOS);
nanoreactors;
click chemistry;
multiplex technologies;
fragment- and ligand-based design;
structure/function/SAR;
computational chemistry and molecular design;
chemoinformatics;
screening techniques and screening interfaces;
analytical and purification methods;
robotics, automation and miniaturization;
targeted libraries;
display libraries;
peptides and peptoids;
proteins;
oligonucleotides;
carbohydrates;
natural diversity;
new methods of library formulation and deconvolution;
directed evolution, origin of life and recombination;
search techniques, landscapes, random chemistry and more;