James A. Lumley , David Fallon , Ryan Whatling , Damien Coupry , Andrew Brown
{"title":"vEXP:虚拟增强型交叉筛选面板,用于检测脱靶药理学警报","authors":"James A. Lumley , David Fallon , Ryan Whatling , Damien Coupry , Andrew Brown","doi":"10.1016/j.comtox.2024.100324","DOIUrl":null,"url":null,"abstract":"<div><p>We describe the development of the GSK vEXP (virtual enhanced cross screen panel) for off-target pharmacology alerts. The derivation of a panel of machine learning classification models or QSAR models (Quantitative Structure-Activity Relationship) for off-target safety assessment allows early alerting to risk factors in candidate drugs. The models are matched to an internal in-vitro biochemical screening panel described previously with some updates reported here. The extreme imbalance of some internal GSK datasets and most of the related external ChEMBL datasets is shown when considering potency thresholds relevant to in-vitro screening. The small size and bias to the active class make many ChEMBL datasets un-modellable using such thresholds. Although larger, many GSK datasets remain too imbalanced to give a performant model. The value of merging internal and external data to help rebalance datasets and improve the domain of applicability is demonstrated with improvements in model performance frequently seen on merged data. Efforts to collate public datasets with a far better balance of the missing in-actives would likely do more to improve opensource models than simply increasing dataset size. We investigate the use of moving the probability threshold and applying imbalanced learners to help overcome the imbalance problem. Both methods can produce models with improved performance when applied to imbalanced datasets. Datasets with class imbalance 95:5 % or with <100 compounds were un-modellable. Where datasets had a class imbalance of 90:10 % the imbalanced learn methods were often more performant than standard tree-based classifiers. No one classification algorithm consistently out-performed all others and our approach emphasises a standardised, automated build and evaluate approach across all classifiers to identify the best model. The application of vEXP includes ranking of hit compounds for fast prioritisation, flagging of hit series that contain systematic scaffold or functional group related risks and the confirmation that late-stage optimisation is not introducing new off-target activities in established chemical series.</p></div>","PeriodicalId":37651,"journal":{"name":"Computational Toxicology","volume":"31 ","pages":"Article 100324"},"PeriodicalIF":3.1000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"vEXP: A virtual enhanced cross screen panel for off-target pharmacology alerts\",\"authors\":\"James A. Lumley , David Fallon , Ryan Whatling , Damien Coupry , Andrew Brown\",\"doi\":\"10.1016/j.comtox.2024.100324\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>We describe the development of the GSK vEXP (virtual enhanced cross screen panel) for off-target pharmacology alerts. The derivation of a panel of machine learning classification models or QSAR models (Quantitative Structure-Activity Relationship) for off-target safety assessment allows early alerting to risk factors in candidate drugs. The models are matched to an internal in-vitro biochemical screening panel described previously with some updates reported here. The extreme imbalance of some internal GSK datasets and most of the related external ChEMBL datasets is shown when considering potency thresholds relevant to in-vitro screening. The small size and bias to the active class make many ChEMBL datasets un-modellable using such thresholds. Although larger, many GSK datasets remain too imbalanced to give a performant model. The value of merging internal and external data to help rebalance datasets and improve the domain of applicability is demonstrated with improvements in model performance frequently seen on merged data. Efforts to collate public datasets with a far better balance of the missing in-actives would likely do more to improve opensource models than simply increasing dataset size. We investigate the use of moving the probability threshold and applying imbalanced learners to help overcome the imbalance problem. Both methods can produce models with improved performance when applied to imbalanced datasets. Datasets with class imbalance 95:5 % or with <100 compounds were un-modellable. Where datasets had a class imbalance of 90:10 % the imbalanced learn methods were often more performant than standard tree-based classifiers. No one classification algorithm consistently out-performed all others and our approach emphasises a standardised, automated build and evaluate approach across all classifiers to identify the best model. The application of vEXP includes ranking of hit compounds for fast prioritisation, flagging of hit series that contain systematic scaffold or functional group related risks and the confirmation that late-stage optimisation is not introducing new off-target activities in established chemical series.</p></div>\",\"PeriodicalId\":37651,\"journal\":{\"name\":\"Computational Toxicology\",\"volume\":\"31 \",\"pages\":\"Article 100324\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Toxicology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2468111324000264\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"TOXICOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Toxicology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468111324000264","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"TOXICOLOGY","Score":null,"Total":0}
vEXP: A virtual enhanced cross screen panel for off-target pharmacology alerts
We describe the development of the GSK vEXP (virtual enhanced cross screen panel) for off-target pharmacology alerts. The derivation of a panel of machine learning classification models or QSAR models (Quantitative Structure-Activity Relationship) for off-target safety assessment allows early alerting to risk factors in candidate drugs. The models are matched to an internal in-vitro biochemical screening panel described previously with some updates reported here. The extreme imbalance of some internal GSK datasets and most of the related external ChEMBL datasets is shown when considering potency thresholds relevant to in-vitro screening. The small size and bias to the active class make many ChEMBL datasets un-modellable using such thresholds. Although larger, many GSK datasets remain too imbalanced to give a performant model. The value of merging internal and external data to help rebalance datasets and improve the domain of applicability is demonstrated with improvements in model performance frequently seen on merged data. Efforts to collate public datasets with a far better balance of the missing in-actives would likely do more to improve opensource models than simply increasing dataset size. We investigate the use of moving the probability threshold and applying imbalanced learners to help overcome the imbalance problem. Both methods can produce models with improved performance when applied to imbalanced datasets. Datasets with class imbalance 95:5 % or with <100 compounds were un-modellable. Where datasets had a class imbalance of 90:10 % the imbalanced learn methods were often more performant than standard tree-based classifiers. No one classification algorithm consistently out-performed all others and our approach emphasises a standardised, automated build and evaluate approach across all classifiers to identify the best model. The application of vEXP includes ranking of hit compounds for fast prioritisation, flagging of hit series that contain systematic scaffold or functional group related risks and the confirmation that late-stage optimisation is not introducing new off-target activities in established chemical series.
期刊介绍:
Computational Toxicology is an international journal publishing computational approaches that assist in the toxicological evaluation of new and existing chemical substances assisting in their safety assessment. -All effects relating to human health and environmental toxicity and fate -Prediction of toxicity, metabolism, fate and physico-chemical properties -The development of models from read-across, (Q)SARs, PBPK, QIVIVE, Multi-Scale Models -Big Data in toxicology: integration, management, analysis -Implementation of models through AOPs, IATA, TTC -Regulatory acceptance of models: evaluation, verification and validation -From metals, to small organic molecules to nanoparticles -Pharmaceuticals, pesticides, foods, cosmetics, fine chemicals -Bringing together the views of industry, regulators, academia, NGOs