Graph neural networks are promising for phenotypic virtual screening on cancer cell lines.

IF 2.5 Q3 BIOCHEMICAL RESEARCH METHODS
Biology Methods and Protocols Pub Date : 2024-09-03 eCollection Date: 2024-01-01 DOI:10.1093/biomethods/bpae065
Sachin Vishwakarma, Saiveth Hernandez-Hernandez, Pedro J Ballester
{"title":"Graph neural networks are promising for phenotypic virtual screening on cancer cell lines.","authors":"Sachin Vishwakarma, Saiveth Hernandez-Hernandez, Pedro J Ballester","doi":"10.1093/biomethods/bpae065","DOIUrl":null,"url":null,"abstract":"<p><p>Artificial intelligence is increasingly driving early drug design, offering novel approaches to virtual screening. Phenotypic virtual screening (PVS) aims to predict how cancer cell lines respond to different compounds by focusing on observable characteristics rather than specific molecular targets. Some studies have suggested that deep learning may not be the best approach for PVS. However, these studies are limited by the small number of tested molecules as well as not employing suitable performance metrics and dissimilar-molecules splits better mimicking the challenging chemical diversity of real-world screening libraries. Here we prepared 60 datasets, each containing approximately 30 000-50 000 molecules tested for their growth inhibitory activities on one of the NCI-60 cancer cell lines. We conducted multiple performance evaluations of each of the five machine learning algorithms for PVS on these 60 problem instances. To provide even a more comprehensive evaluation, we used two model validation types: the random split and the dissimilar-molecules split. Overall, about 14 440 training runs aczross datasets were carried out per algorithm. The models were primarily evaluated using hit rate, a more suitable metric in VS contexts. The results show that all models are more challenged by test molecules that are substantially different from those in the training data. In both validation types, the D-MPNN algorithm, a graph-based deep neural network, was found to be the most suitable for building predictive models for this PVS problem.</p>","PeriodicalId":36528,"journal":{"name":"Biology Methods and Protocols","volume":"9 1","pages":"bpae065"},"PeriodicalIF":2.5000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11537795/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biology Methods and Protocols","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/biomethods/bpae065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Artificial intelligence is increasingly driving early drug design, offering novel approaches to virtual screening. Phenotypic virtual screening (PVS) aims to predict how cancer cell lines respond to different compounds by focusing on observable characteristics rather than specific molecular targets. Some studies have suggested that deep learning may not be the best approach for PVS. However, these studies are limited by the small number of tested molecules as well as not employing suitable performance metrics and dissimilar-molecules splits better mimicking the challenging chemical diversity of real-world screening libraries. Here we prepared 60 datasets, each containing approximately 30 000-50 000 molecules tested for their growth inhibitory activities on one of the NCI-60 cancer cell lines. We conducted multiple performance evaluations of each of the five machine learning algorithms for PVS on these 60 problem instances. To provide even a more comprehensive evaluation, we used two model validation types: the random split and the dissimilar-molecules split. Overall, about 14 440 training runs aczross datasets were carried out per algorithm. The models were primarily evaluated using hit rate, a more suitable metric in VS contexts. The results show that all models are more challenged by test molecules that are substantially different from those in the training data. In both validation types, the D-MPNN algorithm, a graph-based deep neural network, was found to be the most suitable for building predictive models for this PVS problem.

图神经网络有望用于癌症细胞系的表型虚拟筛选。
人工智能正日益推动早期药物设计,为虚拟筛选提供了新方法。表型虚拟筛选(PVS)旨在通过关注可观察到的特征而不是特定的分子靶点,预测癌细胞系对不同化合物的反应。一些研究表明,深度学习可能不是表型虚拟筛选的最佳方法。然而,这些研究受限于测试分子的数量较少,以及没有采用合适的性能指标和异类分子分割来更好地模拟真实世界筛选库中具有挑战性的化学多样性。在这里,我们准备了 60 个数据集,每个数据集包含约 3 万-5 万个分子,测试它们对 NCI-60 癌细胞系之一的生长抑制活性。我们在这 60 个问题实例上对 PVS 的五种机器学习算法分别进行了多次性能评估。为了提供更全面的评估,我们使用了两种模型验证类型:随机拆分和异类分子拆分。总体而言,每种算法在不同数据集上进行了约 14 440 次训练运行。模型主要使用命中率进行评估,命中率是 VS 环境中更合适的指标。结果表明,所有模型在测试分子与训练数据中的分子有很大差异时都会面临更大的挑战。在这两种验证类型中,D-MPNN 算法(一种基于图的深度神经网络)被认为是最适合为这一 PVS 问题建立预测模型的算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Biology Methods and Protocols
Biology Methods and Protocols Agricultural and Biological Sciences-Agricultural and Biological Sciences (all)
CiteScore
3.80
自引率
2.80%
发文量
28
审稿时长
19 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信