{"title":"The specification game: rethinking the evaluation of drug response prediction for precision oncology","authors":"Francesco Codicè, Corrado Pancotti, Cesare Rollo, Yves Moreau, Piero Fariselli, Daniele Raimondi","doi":"10.1186/s13321-025-00972-y","DOIUrl":null,"url":null,"abstract":"<div><p>Precision oncology plays a pivotal role in contemporary healthcare, aiming to optimize treatments for each patient based on their unique characteristics. This objective has spurred the emergence of various cancer cell line drug response datasets, driven by the need to facilitate pre-clinical studies by exploring the impact of multi-omics data on drug response. Despite the proliferation of machine learning models for Drug Response Prediction (DRP), their validation remains critical to reliably assess their usefulness for drug discovery, precision oncology and their actual ability to <i>generalize</i> over the immense space of cancer cells and chemical compounds. </p><p><b>Scientific contribution</b> In this paper we show that the commonly used evaluation strategies for DRP methods can be easily <i>fooled</i> by commonly occurring dataset biases, and they are therefore not able to truly measure the ability of DRP methods to generalize over drugs and cell lines (”specification gaming”). This problem hinders the development of reliable DRP methods and their application to experimental pipelines. Here we propose a new validation protocol composed by three Aggregation Strategies (Global, Fixed-Drug, and Fixed-Cell Line) integrating them with three of the most commonly used train-test evaluation settings, to ensure a truly realistic assessment of the prediction performance. We also scrutinize the challenges associated with using IC50 as a prediction label, showing how its close correlation with the drug concentration ranges worsens the risk of misleading performance assessment, and we indicate an additional reason to replace it with the Area Under the Dose-Response Curve instead.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1000,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00972-y","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-025-00972-y","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Precision oncology plays a pivotal role in contemporary healthcare, aiming to optimize treatments for each patient based on their unique characteristics. This objective has spurred the emergence of various cancer cell line drug response datasets, driven by the need to facilitate pre-clinical studies by exploring the impact of multi-omics data on drug response. Despite the proliferation of machine learning models for Drug Response Prediction (DRP), their validation remains critical to reliably assess their usefulness for drug discovery, precision oncology and their actual ability to generalize over the immense space of cancer cells and chemical compounds.
Scientific contribution In this paper we show that the commonly used evaluation strategies for DRP methods can be easily fooled by commonly occurring dataset biases, and they are therefore not able to truly measure the ability of DRP methods to generalize over drugs and cell lines (”specification gaming”). This problem hinders the development of reliable DRP methods and their application to experimental pipelines. Here we propose a new validation protocol composed by three Aggregation Strategies (Global, Fixed-Drug, and Fixed-Cell Line) integrating them with three of the most commonly used train-test evaluation settings, to ensure a truly realistic assessment of the prediction performance. We also scrutinize the challenges associated with using IC50 as a prediction label, showing how its close correlation with the drug concentration ranges worsens the risk of misleading performance assessment, and we indicate an additional reason to replace it with the Area Under the Dose-Response Curve instead.
期刊介绍:
Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling.
Coverage includes, but is not limited to:
chemical information systems, software and databases, and molecular modelling,
chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases,
computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.