Uncovering blood-brain barrier permeability: a comparative study of machine learning models using molecular fingerprints, and SHAP explainability.

IF 2.3 3区 环境科学与生态学 Q3 CHEMISTRY, MULTIDISCIPLINARY
E Raveendrakumar, B Gopichand, H Bhosale, N Melethadathil, J Valadi
{"title":"Uncovering blood-brain barrier permeability: a comparative study of machine learning models using molecular fingerprints, and SHAP explainability.","authors":"E Raveendrakumar, B Gopichand, H Bhosale, N Melethadathil, J Valadi","doi":"10.1080/1062936X.2024.2446352","DOIUrl":null,"url":null,"abstract":"<p><p>This study illustrates the use of chemical fingerprints with machine learning for blood-brain barrier (BBB) permeability prediction. Employing the Blood Brain Barrier Database (B3DB) dataset for BBB permeability prediction, we extracted nine different fingerprints. Support Vector Machine (SVM) and Extreme Gradient Boosting (XGBoost) algorithms were used to develop models for permeability prediction. Random Forest recursive Feature Selection (RF-RFS) method was used for extracting informative attributes. An additional database was employed for the validation phase. The results indicate that all nine datasets achieved good performance in training, test and validation stages. We further took MACC Keys fingerprints, one of the best performing models for explainability analysis. For this purpose, we used SHapley Additive exPlanations (SHAP) analysis on this dataset for the identification of key structural features influencing BBB permeability prediction. These features include aliphatic carbons, methyl groups and oxygen-containing groups. This study highlights the effectiveness of different fingerprint descriptors in predicting BBB permeability. SHAP analysis provides value additions to the simulations. These simulations will be of significant help in drug discovery processes, particularly in developing Central Nervous System (CNS) therapeutics.</p>","PeriodicalId":21446,"journal":{"name":"SAR and QSAR in Environmental Research","volume":"35 12","pages":"1155-1171"},"PeriodicalIF":2.3000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SAR and QSAR in Environmental Research","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1080/1062936X.2024.2446352","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/8 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

This study illustrates the use of chemical fingerprints with machine learning for blood-brain barrier (BBB) permeability prediction. Employing the Blood Brain Barrier Database (B3DB) dataset for BBB permeability prediction, we extracted nine different fingerprints. Support Vector Machine (SVM) and Extreme Gradient Boosting (XGBoost) algorithms were used to develop models for permeability prediction. Random Forest recursive Feature Selection (RF-RFS) method was used for extracting informative attributes. An additional database was employed for the validation phase. The results indicate that all nine datasets achieved good performance in training, test and validation stages. We further took MACC Keys fingerprints, one of the best performing models for explainability analysis. For this purpose, we used SHapley Additive exPlanations (SHAP) analysis on this dataset for the identification of key structural features influencing BBB permeability prediction. These features include aliphatic carbons, methyl groups and oxygen-containing groups. This study highlights the effectiveness of different fingerprint descriptors in predicting BBB permeability. SHAP analysis provides value additions to the simulations. These simulations will be of significant help in drug discovery processes, particularly in developing Central Nervous System (CNS) therapeutics.

揭示血脑屏障的渗透性:使用分子指纹的机器学习模型的比较研究,以及SHAP的可解释性。
本研究说明了化学指纹与机器学习在血脑屏障(BBB)渗透率预测中的应用。利用血脑屏障数据库(B3DB)数据集进行血脑屏障渗透率预测,提取了9种不同的指纹图谱。采用支持向量机(SVM)和极限梯度提升(XGBoost)算法建立渗透率预测模型。采用随机森林递归特征选择(RF-RFS)方法提取信息属性。验证阶段使用了另一个数据库。结果表明,9个数据集在训练、测试和验证阶段均取得了较好的性能。我们进一步采用了MACC密钥指纹,这是可解释性分析中表现最好的模型之一。为此,我们对该数据集使用SHapley加性解释(SHAP)分析来识别影响血脑屏障渗透率预测的关键结构特征。这些特征包括脂肪碳、甲基和含氧基团。本研究强调了不同指纹描述符在预测血脑屏障通透性方面的有效性。SHAP分析为模拟提供了附加价值。这些模拟将在药物发现过程中有重要的帮助,特别是在开发中枢神经系统(CNS)治疗方面。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.20
自引率
20.00%
发文量
78
审稿时长
>24 weeks
期刊介绍: SAR and QSAR in Environmental Research is an international journal welcoming papers on the fundamental and practical aspects of the structure-activity and structure-property relationships in the fields of environmental science, agrochemistry, toxicology, pharmacology and applied chemistry. A unique aspect of the journal is the focus on emerging techniques for the building of SAR and QSAR models in these widely varying fields. The scope of the journal includes, but is not limited to, the topics of topological and physicochemical descriptors, mathematical, statistical and graphical methods for data analysis, computer methods and programs, original applications and comparative studies. In addition to primary scientific papers, the journal contains reviews of books and software and news of conferences. Special issues on topics of current and widespread interest to the SAR and QSAR community will be published from time to time.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信