癌症合成致死率预测机器学习方法基准测试

IF 14.7 1区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Nature Communications Pub Date : 2024-10-20 DOI:10.1038/s41467-024-52900-7

Yimiao Feng, Yahui Long, He Wang, Yang Ouyang, Quan Li, Min Wu, Jie Zheng

{"title":"癌症合成致死率预测机器学习方法基准测试","authors":"Yimiao Feng, Yahui Long, He Wang, Yang Ouyang, Quan Li, Min Wu, Jie Zheng","doi":"10.1038/s41467-024-52900-7","DOIUrl":null,"url":null,"abstract":"<p>Synthetic lethality (SL) is a gold mine of anticancer drug targets, exposing cancer-specific dependencies of cellular survival. To complement resource-intensive experimental screening, many machine learning methods for SL prediction have emerged recently. However, a comprehensive benchmarking is lacking. This study systematically benchmarks 12 recent machine learning methods for SL prediction, assessing their performance across diverse data splitting scenarios, negative sample ratios, and negative sampling techniques, on both classification and ranking tasks. We observe that all the methods can perform significantly better by improving data quality, e.g., excluding computationally derived SLs from training and sampling negative labels based on gene expression. Among the methods, SLMGAE performs the best. Furthermore, the methods have limitations in realistic scenarios such as cold-start independent tests and context-specific SLs. These results, together with source code and datasets made freely available, provide guidance for selecting suitable methods and developing more powerful techniques for SL virtual screening.</p>","PeriodicalId":19066,"journal":{"name":"Nature Communications","volume":null,"pages":null},"PeriodicalIF":14.7000,"publicationDate":"2024-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Benchmarking machine learning methods for synthetic lethality prediction in cancer\",\"authors\":\"Yimiao Feng, Yahui Long, He Wang, Yang Ouyang, Quan Li, Min Wu, Jie Zheng\",\"doi\":\"10.1038/s41467-024-52900-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Synthetic lethality (SL) is a gold mine of anticancer drug targets, exposing cancer-specific dependencies of cellular survival. To complement resource-intensive experimental screening, many machine learning methods for SL prediction have emerged recently. However, a comprehensive benchmarking is lacking. This study systematically benchmarks 12 recent machine learning methods for SL prediction, assessing their performance across diverse data splitting scenarios, negative sample ratios, and negative sampling techniques, on both classification and ranking tasks. We observe that all the methods can perform significantly better by improving data quality, e.g., excluding computationally derived SLs from training and sampling negative labels based on gene expression. Among the methods, SLMGAE performs the best. Furthermore, the methods have limitations in realistic scenarios such as cold-start independent tests and context-specific SLs. These results, together with source code and datasets made freely available, provide guidance for selecting suitable methods and developing more powerful techniques for SL virtual screening.</p>\",\"PeriodicalId\":19066,\"journal\":{\"name\":\"Nature Communications\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":14.7000,\"publicationDate\":\"2024-10-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nature Communications\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1038/s41467-024-52900-7\",\"RegionNum\":1,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Communications","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41467-024-52900-7","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

合成致死（SL）是抗癌药物靶点的一座金矿，它揭示了癌症特异性的细胞存活依赖性。为了补充资源密集型实验筛选，最近出现了许多用于合成致死率预测的机器学习方法。然而，目前还缺乏全面的基准测试。本研究系统地对 12 种最新的 SL 预测机器学习方法进行了基准测试，评估了它们在不同的数据分割场景、负样本比率和负采样技术下，在分类和排序任务中的表现。我们发现，通过提高数据质量，例如从训练中排除计算得出的 SL 和基于基因表达的负标签采样，所有方法都能显著提高性能。在这些方法中，SLMGAE 的表现最好。此外，这些方法在冷启动独立测试和特定语境 SL 等现实场景中存在局限性。这些结果以及免费提供的源代码和数据集为选择合适的方法和开发更强大的 SL 虚拟筛选技术提供了指导。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Benchmarking machine learning methods for synthetic lethality prediction in cancer

查看原文本刊更多论文

Benchmarking machine learning methods for synthetic lethality prediction in cancer

Synthetic lethality (SL) is a gold mine of anticancer drug targets, exposing cancer-specific dependencies of cellular survival. To complement resource-intensive experimental screening, many machine learning methods for SL prediction have emerged recently. However, a comprehensive benchmarking is lacking. This study systematically benchmarks 12 recent machine learning methods for SL prediction, assessing their performance across diverse data splitting scenarios, negative sample ratios, and negative sampling techniques, on both classification and ranking tasks. We observe that all the methods can perform significantly better by improving data quality, e.g., excluding computationally derived SLs from training and sampling negative labels based on gene expression. Among the methods, SLMGAE performs the best. Furthermore, the methods have limitations in realistic scenarios such as cold-start independent tests and context-specific SLs. These results, together with source code and datasets made freely available, provide guidance for selecting suitable methods and developing more powerful techniques for SL virtual screening.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Nature Communications Biological Science Disciplines-

CiteScore

24.90

自引率

2.40%

发文量

6928

审稿时长

3.7 months

期刊介绍： Nature Communications, an open-access journal, publishes high-quality research spanning all areas of the natural sciences. Papers featured in the journal showcase significant advances relevant to specialists in each respective field. With a 2-year impact factor of 16.6 (2022) and a median time of 8 days from submission to the first editorial decision, Nature Communications is committed to rapid dissemination of research findings. As a multidisciplinary journal, it welcomes contributions from biological, health, physical, chemical, Earth, social, mathematical, applied, and engineering sciences, aiming to highlight important breakthroughs within each domain.