基于分子描述符和图卷积网络的集成机器学习方法用于预测MDR1和BCRP转运蛋白的流出活性。

IF 5 3区医学 Q1 PHARMACOLOGY & PHARMACY

AAPS Journal Pub Date : 2023-09-12 DOI:10.1208/s12248-023-00853-y

Asahi Adachi, Tomoki Yamashita, Shigehiko Kanaya, Yohei Kosugi

{"title":"基于分子描述符和图卷积网络的集成机器学习方法用于预测MDR1和BCRP转运蛋白的流出活性。","authors":"Asahi Adachi, Tomoki Yamashita, Shigehiko Kanaya, Yohei Kosugi","doi":"10.1208/s12248-023-00853-y","DOIUrl":null,"url":null,"abstract":"Multidrug resistance (MDR1) and breast cancer resistance protein (BCRP) play important roles in drug absorption and distribution. Computational prediction of substrates for both transporters can help reduce time in drug discovery. This study aimed to predict the efflux activity of MDR1 and BCRP using multiple machine learning approaches with molecular descriptors and graph convolutional networks (GCNs). In vitro efflux activity was determined using MDR1- and BCRP-expressing cells. Predictive performance was assessed using an in-house dataset with a chronological split and an external dataset. CatBoost and support vector regression showed the best predictive performance for MDR1 and BCRP efflux activities, respectively, of the 25 descriptor-based machine learning methods based on the coefficient of determination (R2). The single-task GCN showed a slightly lower performance than descriptor-based prediction in the in-house dataset. In both approaches, the percentage of compounds predicted within twofold of the observed values in the external dataset was lower than that in the in-house dataset. Multi-task GCN did not show any improvements, whereas multimodal GCN increased the predictive performance of BCRP efflux activity compared with single-task GCN. Furthermore, the ensemble approach of descriptor-based machine learning and GCN achieved the highest predictive performance with R2 values of 0.706 and 0.587 in MDR1 and BCRP, respectively, in time-split test sets. This result suggests that two different approaches to represent molecular structures complement each other in terms of molecular characteristics. Our study demonstrated that predictive models using advanced machine learning approaches are beneficial for identifying potential substrate liability of both MDR1 and BCRP.","PeriodicalId":50934,"journal":{"name":"AAPS Journal","volume":null,"pages":null},"PeriodicalIF":5.0000,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Ensemble Machine Learning Approaches Based on Molecular Descriptors and Graph Convolutional Networks for Predicting the Efflux Activities of MDR1 and BCRP Transporters.\",\"authors\":\"Asahi Adachi, Tomoki Yamashita, Shigehiko Kanaya, Yohei Kosugi\",\"doi\":\"10.1208/s12248-023-00853-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multidrug resistance (MDR1) and breast cancer resistance protein (BCRP) play important roles in drug absorption and distribution. Computational prediction of substrates for both transporters can help reduce time in drug discovery. This study aimed to predict the efflux activity of MDR1 and BCRP using multiple machine learning approaches with molecular descriptors and graph convolutional networks (GCNs). In vitro efflux activity was determined using MDR1- and BCRP-expressing cells. Predictive performance was assessed using an in-house dataset with a chronological split and an external dataset. CatBoost and support vector regression showed the best predictive performance for MDR1 and BCRP efflux activities, respectively, of the 25 descriptor-based machine learning methods based on the coefficient of determination (R2). The single-task GCN showed a slightly lower performance than descriptor-based prediction in the in-house dataset. In both approaches, the percentage of compounds predicted within twofold of the observed values in the external dataset was lower than that in the in-house dataset. Multi-task GCN did not show any improvements, whereas multimodal GCN increased the predictive performance of BCRP efflux activity compared with single-task GCN. Furthermore, the ensemble approach of descriptor-based machine learning and GCN achieved the highest predictive performance with R2 values of 0.706 and 0.587 in MDR1 and BCRP, respectively, in time-split test sets. This result suggests that two different approaches to represent molecular structures complement each other in terms of molecular characteristics. Our study demonstrated that predictive models using advanced machine learning approaches are beneficial for identifying potential substrate liability of both MDR1 and BCRP.\",\"PeriodicalId\":50934,\"journal\":{\"name\":\"AAPS Journal\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2023-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AAPS Journal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1208/s12248-023-00853-y\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PHARMACOLOGY & PHARMACY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AAPS Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1208/s12248-023-00853-y","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}

引用次数: 0

摘要

多药耐药（MDR1）和乳腺癌症耐药蛋白（BCRP）在药物吸收和分布中起着重要作用。对两种转运蛋白的底物进行计算预测可以帮助缩短药物发现的时间。本研究旨在使用具有分子描述符和图卷积网络（GCN）的多种机器学习方法来预测MDR1和BCRP的流出活性。使用MDR1-和BCRP表达细胞测定体外流出活性。预测性能使用按时间划分的内部数据集和外部数据集进行评估。在基于决定系数（R2）的25种基于描述符的机器学习方法中，CatBoost和支持向量回归分别显示出MDR1和BCRP流出活动的最佳预测性能。在内部数据集中，单任务GCN的性能略低于基于描述符的预测。在这两种方法中，外部数据集中预测的化合物百分比在观测值的两倍以内，低于内部数据集中的预测百分比。多任务GCN没有显示出任何改善，而与单任务GCN相比，多模式GCN提高了BCRP流出活动的预测性能。此外，在时间分割测试集中，基于描述符的机器学习和GCN的集成方法在MDR1和BCRP中分别获得了最高的预测性能，R2值分别为0.706和0.587。这一结果表明，表示分子结构的两种不同方法在分子特征方面是互补的。我们的研究表明，使用先进机器学习方法的预测模型有利于识别MDR1和BCRP的潜在底物责任。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Ensemble Machine Learning Approaches Based on Molecular Descriptors and Graph Convolutional Networks for Predicting the Efflux Activities of MDR1 and BCRP Transporters.

查看原文本刊更多论文

Ensemble Machine Learning Approaches Based on Molecular Descriptors and Graph Convolutional Networks for Predicting the Efflux Activities of MDR1 and BCRP Transporters.

Multidrug resistance (MDR1) and breast cancer resistance protein (BCRP) play important roles in drug absorption and distribution. Computational prediction of substrates for both transporters can help reduce time in drug discovery. This study aimed to predict the efflux activity of MDR1 and BCRP using multiple machine learning approaches with molecular descriptors and graph convolutional networks (GCNs). In vitro efflux activity was determined using MDR1- and BCRP-expressing cells. Predictive performance was assessed using an in-house dataset with a chronological split and an external dataset. CatBoost and support vector regression showed the best predictive performance for MDR1 and BCRP efflux activities, respectively, of the 25 descriptor-based machine learning methods based on the coefficient of determination (R²). The single-task GCN showed a slightly lower performance than descriptor-based prediction in the in-house dataset. In both approaches, the percentage of compounds predicted within twofold of the observed values in the external dataset was lower than that in the in-house dataset. Multi-task GCN did not show any improvements, whereas multimodal GCN increased the predictive performance of BCRP efflux activity compared with single-task GCN. Furthermore, the ensemble approach of descriptor-based machine learning and GCN achieved the highest predictive performance with R² values of 0.706 and 0.587 in MDR1 and BCRP, respectively, in time-split test sets. This result suggests that two different approaches to represent molecular structures complement each other in terms of molecular characteristics. Our study demonstrated that predictive models using advanced machine learning approaches are beneficial for identifying potential substrate liability of both MDR1 and BCRP.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

AAPS Journal 医学-药学

CiteScore

7.80

自引率

4.40%

发文量

109

审稿时长

1 months

期刊介绍： The AAPS Journal, an official journal of the American Association of Pharmaceutical Scientists (AAPS), publishes novel and significant findings in the various areas of pharmaceutical sciences impacting human and veterinary therapeutics, including: · Drug Design and Discovery · Pharmaceutical Biotechnology · Biopharmaceutics, Formulation, and Drug Delivery · Metabolism and Transport · Pharmacokinetics, Pharmacodynamics, and Pharmacometrics · Translational Research · Clinical Evaluations and Therapeutic Outcomes · Regulatory Science We invite submissions under the following article types: · Original Research Articles · Reviews and Mini-reviews · White Papers, Commentaries, and Editorials · Meeting Reports · Brief/Technical Reports and Rapid Communications · Regulatory Notes · Tutorials · Protocols in the Pharmaceutical Sciences In addition, The AAPS Journal publishes themes, organized by guest editors, which are focused on particular areas of current interest to our field.